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METHODS OF CLONING AND PRODUCING FRAGMENT CHAINS WITH READABLE INFORMATION 
CONTENT 

The present invention relates to new methods of 
attaching first and second nucleic acid molecules, 
5 particularly methods of cloning in which adapter 

molecules mediate the binding between the first and 
second molecules, the resultant nucleic acid molecules 
thus formed and methods of generating DNA with a readily 
readable information content and kits for performing 

10 such methods. 

Presently known cloning methods generally involve 
the use of restriction enzymes which are used to 
generate fragments for insertion and cleave vectors to 
produced corresponding and hence complementary terminal 

15 sequences. Generally, the enzymes which are used cut 
palindromic sequences and thus produce identical 
overhangs. Different sequences that are cut with the 
same restriction endonucleases can then be ligated 
together to form new, recombinant nucleic acids. 

20 However, such methods suffer from a number of 

limitations. One disadvantage in using endonucleases 
that form two identical overhangs is the formation of 
different products on ligation. If for example two 
fragments A and B are to be ligated, as a consequence of 

25 common overhangs the products A+A and B+B as well as the 
desired A+B will be produced. Other by-products 
resulting from other fragments produced when A and B 
were formed will also be generated, e.g. reassociation 
into the original positions. It is therefore normal to 

30 use a separation process using agarose gels. The 
separation procedure however often results in a 
considerable loss of DNA. 

Such methods necessarily suffer from various 
limitations including the by-products mentioned above, 

35 and the need to identify the desired end-products, e.g. 
if only a particular insert is to be cloned. 

Other cloning techniques have been used in which 
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cloning has been performed using PCR techniques, e.g. in 
which the PCR primers have IIS enzyme recognition sites. 
However, the use of PCR is disadvantageous in cloning 
techniques as it is time consuming and requires 
5 purification steps which result in significant loss of 
yield. The PCR reaction may also introduce point 
mutations and the like and the length of the fragment is 
limited to the polymerase capacity, e.g. a maximum of 
approximately 50kb. 

10 It has now surprisingly been found that by 

generating fragments with unique single stranded regions 
and then mediating the binding between a first and 
second nucleic acid molecule, many of these 
disadvantages may be avoided. In this method, 

15 restriction nucleases are used that form non- identical 
overhangs, e.g. type IP or IIS restriction 
endonucleases . As will.be appreciated, if one uses a 
restriction endonuclease that makes overhangs of 4 base 
pairs, each fragment that is formed will have two 

20 overhangs of 4 base pairs each. It is theoretically 

possible therefore that 4 8 (ie. 65,536) fragments may be 
formed with different combinations of the two overhangs. 
Thus, as a rule, each fragment formed on cleavage will 
have a unique pair of overhangs even when cleaving large 

25 nucleic acid molecules. 

These unique overhangs may then be addressed and 
adjusted appropriately using adapters with two 
overhangs. For example in a cloning technique one of 
the overhangs is made to correspond to the overhang on 

3 0 the insert and the other overhang is made to correspond 
to the overhang on the vector into which the insert is 
to be introduced. This method is outlined in Figure 1. 
In that case the DNA molecule containing the insert is 
cut with a restriction endonuclease which makes an 

35 overhang on each side of the insert. Each of the many 

fragments which are formed have different overhangs such 
that the two overhangs at either end of the insert are 
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unique. Ligase is then added to bind two adapters with 
corresponding single stranded regions. This leads to 
the formation of two new overhangs at the termini of the 
insert, which are selected such that they can be used to 
5 bind to the vector into which the insert is to be 

cloned. Providing identical overhangs are not created 
on other molecules only the desired insert will be 
ligated to the adapters. In the final step the insert 
is ligated into the vector which has two overhangs which 

1 0 complement the adapters 1 overhangs . The overhangs in 

the vector may be constructed using the same principles 
as described for the insert. 

Thus in this new method, an adapter molecule is 
used which is complementary to a single stranded region 

15 generated on the first nucleic acid molecule and 

therefore binds to that molecule, but has a different 
single stranded region at its other terminus, thus 
effectively modifying the single stranded region 
presented for binding by the first nucleic acid molecule 

20 fragment. The adapter's free single stranded region may 
then mediate the binding of the first nucleic acid 
molecule fragment to a second nucleic acid molecule 
exhibiting a complementary single stranded region. 
This method of mediation has particular 

25 applications for effectively identifying and selecting a 
first nucleic acid molecule fragment and then mediating 
its binding to a second nucleic acid molecule where this 
was not previously possible. 

Of particular relevance to methods of cloning is 

3 0 the generation of fragments for cloning which have 
different single stranded regions at their termini 
relative to other fragments, which may then be selected 
and cloned into an appropriate vector. As described 
herein, such fragments are generated by the use of 

35 enzymes which cleave out-side their recognition site and 
thus produce overhangs that depend on the sequence 
surrounding the recognition site which is likely to vary 
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from fragment to fragment. 

Such techniques may be used to direct only a single 
fragment to a particular vector or may be used to direct 
different fragments to different sites or indeed 
5 different vectors, even within the same reaction mix, 
providing appropriate adapters are constructed. 

These methods have particular advantages over prior 
art methods. In particular, the whole procedure may be 
carried out in one or two steps, e.g. cutting and 

10 ligating simultaneously or cutting and ligating 

separately. Even in instances where the procedure is 
performed in two steps, it will often be possible to 
perform both steps in the same buffer, e.g. since T4 DNA 
ligase is known to work well in most buffers for 

15 restriction endonucleases . Time- and resource -consuming 
precipitation procedures may therefore be avoided. 
Moreover, ligations can be performed with overhangs of 
4-6 bases, unlike conventional cloning where overhangs 
of 0-4 bases are used, thereby increasing ligation 

20 efficiency considerably. 

Furthermore, the need to carry out gel separations 
may be avoided. The quantity of DNA required initially 
can be reduced substantially. Mutation of DNA molecules 
on UV exposure, a common occurrence in gel separation, 

2 5 may also be avoided. Furthermore , laboratory staff are 
not exposed to carcinogenic EtBr. Also, separation 
problems which can occur when restriction cleavage 
results in fragments of similar size may be avoided. 
The frequency of undesirable side-products such as empty 

30 vectors, too many inserts or incorrect orientation of 
the inserts may also be avoided. 

Since it is generally not problematic if the insert 
is cleaved, a small selection, e.g of type IIS or Ip 
restriction endonucleases could provide far more cloning 

35 possibilities than a corresponding selection of ordinary 
type II restriction endonuclease used for conventional 
cloning procedures. Having a few type IIS, IP and 
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similar restriction endonucleases that cleave with high 
frequency allows for many cloning possibilities. 

In the specific instance of cloning of large DNA 
molecules (e.g. genomic DNA) or a solution containing 
5 many different DNA molecules in parallel (e.g. a cDNA 
library) it is very difficult to use conventional 
methods. If for example a large DNA molecule is cleaved 
with EcoRl , a large number of fragments may be formed 
with the same overhang, and in addition a considerable 
10 proportion of these fragments may be of roughly the same 

size. This may lead to the formation of a large number 
of undesired ligation products, even with gel 
separation. Moreover, gel separation can be difficult 
if the insert is large. Furthermore, it is also often 
15 difficult, or even impossible, to find restriction 

endonucleases that will not cut large inserts. These 
problems may be reduced/eliminated using the cloning 
procedure described herein. 

If necessary, it is possible to increase the number 
20 of base pairs in the overhangs to (e.g.) 6 by using Cjel 

or similar endonucleases to form an even greater number 
of possible variables and thus increase the probability 
of producing unique overhangs. 

The advantages of the method of the invention are 
25 even greater in complex cloning procedures. If several 
adapters are used for example, it is possible to clone 
many different inserts into one and the same vector at a 
corresponding number of different sites in one and the 
same reaction, as described hereinafter in more detail. 
30 Deletions of small or large fragments may also be 

achieved using the same basic principle. This opens up 
the possibility of making complex recombinations of 
inter alia genomic DNA (removal of endogen viruses in 
genomes to be used for xenotransplantation, the 
35 insertion of a large number of genes from other genomes, 

new combinations of genes etc.). The method can also be 
used for exon-shuf fling and other recombinations that 
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are relevant in connection with artificial evolutionary 
systems . 

Thus, in a first aspect, the present invention 
provides a method of attaching a fragment of a first 
5 nucleic acid molecule to a second nucleic acid molecule, 

wherein said method comprises at least the steps : 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
recognition site to create at least one fragment of said 

10 first nucleic acid molecule having a single stranded 

nucleotide region (SSla) at at least one terminus of 
said fragment, 

2) if necessary generating a single stranded 
nucleotide region (SS2) at at least one terminus of said 

15 second nucleic acid molecule, 

3) binding to at least one single stranded region of 
step 1) (SSla) an adapter molecule comprising at one 
terminus a single stranded region (SSA1) complementary 
to the single stranded region of said first nucleic acid 

20 molecule fragment (SSla) and additionally comprising at 
the other terminus a further single stranded region 
(SSA2) complementary to the single stranded region (SS2) 
at one terminus of said second nucleic acid molecule, 

4) ligating said adapter to said first nucleic acid 
25 fragment , 

5) binding said adapter to said second nucleic acid 
molecule, and 

6) ligating said adapter to said second nucleic acid 
molecule . 

30 

As used herein, said first and second nucleic acid 
molecules are any naturally occurring or synthetic 
polynucleotide molecules, e.g. DNA, such as genomic or 
cDNA, PNA and their analogs, which are double stranded 
3 5 and in which single stranded regions may be generated. 

Fragments of the first nucleic acid molecule are 
generated by use of a nuclease which cleaves outside its 
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recognition site. One or more fragments may be 
generated depending on the sites which are cleaved (e.g. 
if the site is at the extreme end of the molecule only a 
few bases may be removed rather than the production of 2 
5 fragments) . Other nucleic acid molecule fragments 

described herein may be generated by any appropriate 
means, as mentioned herein, including the techniques 
used to produce the first nucleic acid molecule 
fragments. Fragments are preferably more than 10 bases, 

10 e.g. 10 to 200bp, preferably more than 100 bases in 
length. For cloning applications, fragments having 
lengths in excess of 200 bases, e.g. from 200 bases to 
2kb may be used. Where longer single stranded regions 
are generated, fragments of longer lengths are also 

15 contemplated, e.g. 10-100kb or longer. 

"Single stranded regions' 1 as referred to herein are 
regions of overhang at the end, ie . at the terminus of 
the first, second or third nucleic acid molecules or 
adapter molecules. These regions are sufficient to 

20 allow specific binding of molecules having complementary 
single stranded regions and subsequent ligation between 
these molecules. Thus, the single stranded regions are 
at least 1 base in length, preferably 3 bases in length, 
but preferably at least 4 bases, e.g. from 4 to 10 

25 bases, e.g. 4, 5 or 6 bases in length. Single stranded 
regions up to 2 0 bases in length are contemplated which 
will allow the use of fragments in the method of the 
invention which are up to Mb in length. 

"Binding" as used herein refers to the step of 

30 association of complementary single stranded regions 
(ie. non-covalent binding) . Subsequent "ligation" of 
the sequences achieves covalent binding. 

"Complementary" as used herein refers to specific 
base recognition via for example base-base 

3 5 complementarity. However, complementarity as referred 
to herein includes pairing of nucleotides in Watson- 
Crick base-pairing in addition to pairing of nucleoside 
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analogs, e.g. deoxyinosine which are capable of specific 
hybridization to the base in the nucleic acid molecules 
and other analogs which result in such specific 
hybridization, e.g. PNA , DNA and their analogs. 
5 Complementarity of one single stranded region to another 
is considered to be sufficient when, under the 
conditions used, specific binding is achieved. Thus in 
the case of long single stranded regions some lack of 
base-base specificity, e.g. mis-match, may be tolerated, 

10 e.g. if one base in a series of 10 bases is not 

complementary. Such slight mismatches which do not 
affect the ultimate binding and ligation of the single 
stranded regions are considered to be complementary for 
the purposes of this invention. The single stranded 

15 regions may retain portions, on binding, which remain 

single stranded, e.g. when overhangs of different sizes 
are employed or the complementary portions do not 
comprise all of the single stranded regions. In such 
cases, as mentioned above, providing binding can be 

20 achieved the single stranded regions are considered to 
be complementary. In those cases, prior to ligation, 
missing bases may be filled in e.g. using Klenow 
fragment, or other appropriate techniques as necessary. 
"Adapters" as referred to herein are molecules 

25 which adapt the first nucleic acid molecule fragment for 
binding to a second or third nucleic acid molecule. 
Adapter molecules comprise at least two regions. A 
first portion containing a single stranded region which 
is complementary to the single stranded region on the 

30 first nucleic acid molecule fragment and a second 

portion containing a single stranded region which is 
complementary to the single stranded region on the 
second nucleic acid molecule. The single stranded 
regions are as described hereinbefore and are preferably 

35 on different strands making up the adapter molecule. 

The above mentioned portions are at least as large as 
the single stranded regions, e.g. 4 to 6 bases in 
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length, although they may be longer, e.g. up to 20 bases 
in length. 

A linking region between these single stranded 
regions is required for the stability of the molecule. 
5 Conveniently this comprises a double stranded nucleic 
acid fragment, especially in methods of cloning where 
amplification, replication and/or translation are to be 
performed. However, this portion may be substituted by 
any appropriate molecule depending on the end use of the 

10 resulting ligated molecule. Clearly, to achieve 

ligation between the first and second nucleic acid 
molecules appropriate attachment points and moieties for 
ligation must be provided. 

The linking portion may serve more than just a 

15 linking function and may for example provide sequences 
appropriate for primer or probe binding, e.g. for 
amplification or identification, respectively, or may 
contain integration sites for mobile elements such as 
transposons and the like. Depending on how the method 

20 is performed, the adapters preferably do not contain 

restriction sites for any restriction enzymes used in 
the method of the invention thus avoiding the need to 
inactivate or remove the enzymes prior to the addition 
of the adapters. 

25 Conveniently adapter molecules may be exclusively 

comprised of a nucleic acid molecule in which the 
various properties of the adapter are provided by the 
different regions of the adapter. 

Conveniently adapters are made up of two 

30 complementary oligonucleotides having between 10 and 100 
bases each, e.g. between 20 and 50 bases. 

In the method described above, preferably at least 
one first nucleic molecule fragment is generated having 
a single stranded region at either end (SSla and SSlb) 

35 to each of which an adapter binds. 

Preferably the method described herein is used for 
cloning. Thus, in the method described above, an 
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adapter is bound at either end of the first nucleic acid 
molecule fragment (in which the adapters may be the same 
of different) , and the unbound end of the first adapter 
is bound to the second nucleic acid molecule and the 
5 unbound end of the second adapter binds either to the 
second nucleic acid molecule (ie. at the other end 
distal to the binding of the first adapter, thereby 
forming a circular molecule) or binds to a third nucleic 
acid molecule. The first of these two alternatives may 
10 arise through cleavage of a circular vector to give rise 
to the second nucleic acid molecule to which the 
[adapter 1] : [first nucleic acid molecule 

fragment] : [adapter 2] insert is bound to re-circularize 
the vector. Alternatively, a linear or circular vector 

15 may be cleaved giving rise to two or more discrete 
fragments (herein the second and third nucleic acid 
molecules) which may be joined by the adapter lrfirst 
nucleic acid molecule : adapter 2. 

Thus, in a preferred feature, a first nucleic acid 

2 0 molecule fragment is generated which has a single 

stranded nucleotide region at either terminus (SSla and 
SSlb) , each of which is bound by an adapter, which may 
be the same or different, and the first of said adapters 
is bound to said second nucleic acid molecule and the 

25 second of said adapters binds either to said second 

nucleic acid molecule or to a third nucleic acid 
molecule. 

Thus, alternatively stated, in a preferred 
embodiment, the present invention provides a method of 
30 cloning a fragment of a first nucleic acid molecule into 

a second nucleic acid molecule, wherein said method 
comprises at least the steps: 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
35 recognition site to create one or more fragments of said 

first nucleic acid molecule, wherein at least one 
fragment has a single stranded nucleotide region at both 



aMSDOCID <WO 01008 ISA 1J_> 



A 



WO 01/00816 



PCT/GB00/02512 



- 11 - 

termini (SSla and SSlb) , 

2) cleaving said second nucleic acid molecule to 
create at least two single stranded regions (SS2a and 
SS2b) at the site of said cleavage (e.g. linearizing a 

5 circular vector or producing fragments in a linear or 
circular vector) , 

3) binding to one of the single stranded regions of 
step 1) (SSla) 

a first adapter molecule comprising at one terminus 
10 a single stranded region (SSA1) complementary to 

the single stranded region of said first nucleic 
acid molecule fragment (SSla) and additionally 
comprising at the other terminus a further single 
stranded region (SSA2) complementary to one of the 
15 single stranded regions (SS2a) produced by cleavage 

of said second nucleic acid molecule, and 
binding to a second single stranded region of step 1) 
(SSlb) 

a second adapter molecule as defined above which 
20 binds to the second single stranded region of said 

first nucleic acid molecule fragment (SSlb) and to 
the second single stranded region (SS2b) produced 
by cleavage of said second nucleic acid molecule, 

4) ligating said adapters to said first nucleic acid 

» 

2 5 fragment, 

5) binding said, adapters to said second nucleic acid 
molecule or fragments thereof, and 

6) ligating said adapters to said second nucleic acid 
molecule or fragments thereof . 

30 In instances in which cleavage of the second 

nucleic acid molecule results in the production of two 
or more discrete fragments which become ligated to the 
first nucleic acid molecule fragment via the adapters, 
said fragments constitute second and third nucleic acid 

35 molecules of the invention. 

Preferably, to prevent concat ermirisat ion of 
[adapter : first nucleic acid fragment : adapter ] units, the 
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single stranded region of the second and third nucleic 
acid molecules which bind to these adapters are not 
complementary. Thus, for example, where cloning into a 
vector is performed, preferably said vector is 
5 linearized and at least of portion of said vector is 

removed from one terminus of that vector, e.g. at least 
two cleavage events occur. 

In such methods, particularly for cloning, the 
second* nucleic acid molecule, e.g. into which a first 

10 nucleic acid molecule fragment is inserted is 

conveniently a vector (or a part thereof, e.g. where the 
second and third nucleic acid molecules together 
comprise the vector, and result through its cleavage) . 
Such vectors include any double stranded nucleic acid 

15 molecule which may be linear or circular. (However, as 
mentioned above in respect of the adapters, providing 
single stranded regions exist, or are generated at the 
termini of the second nucleic acid or its fragments 
(e.g. the vector), the adjacent regions may be made up 

20 of any molecule providing ligation at the termini to the 
adapters is not compromised.) 

Conveniently such vectors may contain sequences 
which aid their use in methods of the invention or their 
subsequent manipulation. Thus, vectors are conveniently 

25 selected with only two or a small number of restriction 
cleavage sites for the method of cleavage used. Thus 
for example where restriction enzymes are used, the 
vector is selected to include only a minimal number, 
preferably only two recognition sites to that enzyme. 

3 0 Vectors may additionally comprise further portions 

or sequences for cloning, selection, amplification, 
transcription or translation as appropriate. Thus 
vectors may be used with probe or primer sites, promoter 
regions, other regulatory regions, e.g. expression 

3 5 control sequences etc. Conveniently well-known cloning 
vectors are employed, such as pBR322 and derived 
vectors, pUC vectors such as pUC19, lambda vectors, BAC, 
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YAC and MAC vectors and other appropriate plasmids or 
viral vectors. 

The molecule of which a fragment is to be inserted, 
ie. the first nucleic acid molecule, may be any molecule 
5 which can generate single stranded regions at at least 
one of its ends using the nucleases described herein, 
although the central portion may be varied as 
appropriate. Preferably however such molecules are 
double stranded nucleic acid molecules and contain 

10 appropriate sites for the use of enzymes to create the 
single stranded overhangs which are required in 
accordance with the invention. Appropriately, the first 
nucleic acid molecule is derived from genomic DNA and 
the method of the invention is used to insert fragments 

15 thereof into appropriate vectors . 

Adapters which may be used include short double 
stranded nucleic acid molecules with single stranded 
regions at their termini to longer molecules which may 
contain further sequences for example to allow selection 

20 as described hereinafter. Appropriate single stranded 
regions are selected on the basis of the terminal 
sequence of the first, second and third nucleic acid 
molecules or fragments thereof. Appropriate selection 
may also be used to direct the orientation of the 

25 insert, e.g. to produce clones which may be used to 
produce antisense nucleic acid molecules. 

Adapters may be used in the methods of the 
invention in which their single stranded overhangs have 
already been generated, e.g. by the combination of 

30 single stranded complementary oligonucleotides which on 
hybridization leave overhangs at either ends, or by 
appropriate cleavage or digestion. 

Alternatively, during the method of the invention, 
adapters may be modified to provide single stranded 

35 portions, e.g. by the use of restriction enzymes or 

other appropriate techniques during the course of the 
reaction. Conveniently, to simplify the number of 
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steps, the enzymes used to generate single stranded 
regions in the first, second or third nucleic acid 
molecules (where necessary) may be used to generate the 
adapter single stranded regions. 
5 As mentioned previously, the single stranded region 

may be 4 or more bases in length. When using longer 
overhangs or where the sequence of the full 
corresponding single stranded region of the first, 
second or third nucleic acid molecules is not known or 

10 unclear, a family of adapters with one or more 

degenerate bases in the single stranded region may be 
used, for example using methods to create libraries of 
adapters. Degenerate bases may also be used at 
positions prone to mis-match ligations. 

15 For convenience a universal library of adapters may 

be created for use in the method of the invention. Thus 
for example, 16 different adapters with a 4 base-pair 
overhang consisting of two random bases (NN) and two 
bases specific to each adapter (e.g. AA, CC, . . . TT) may 

20 be created. In this way sufficient adapters may be 

created which are capable of distinguishing between 16 
different first molecule fragment overhangs, which would 
suffice for many cloning purposes. Similarly a library 
of second molecule, e.g. vector overhangs may be 

25 created. 

To increase the number of permutations in an 
adapter library, two separate oligonucleotide libraries 
may be generated, one with single stranded 
oligonucleotides with regions that will correspond to 

30 the single stranded region of the first nucleic acid 

molecule fragment and the second library with single 
stranded oligonucleotides with regions that will 
correspond to the single stranded region of the second 
nucleic acid molecule (e.g. vector) . However in common 

3 5 in each member of the library is a complementary region, 

such that when one member from the first library is 
selected and combined with a member of the second 
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library, they will hybridize leaving free the relevant 
single stranded regions . Thus for example to generate 
an adapter with an AA overhang and a TC overhang to bind 
to the first and second nucleic acid molecules 
5 respectively, members of the different libraries such as 
GGGCCCCNNAA may be combined with TCNNNCCGGGG to form: 
GGCCCCCNNAA , 
TCNNNCCGGGG 

which exhibits the appropriate overhangs. When using 

10 only two 16 member libraries this allows the production 
of 256 different adapters. 

In generating appropriate adapters conveniently the 
amount of mis-match which needs to be tolerated when 
binding to overhangs on first, second and/or third 

15 nucleic acid molecules should be reduced. This may 

conveniently be achieved by selecting oligonucleotides 
on the basis of the probability of a mismatch ligation 
being generated. A computer program for achieving this 
is described in more detail in Example G . This method 

20 allows sets of oligonucleotides to be identified which 
can be used to construct chains with more than 100 
fragments in a single ligation cycle but with very low 
levels of mis-match. Thus in a further feature the 
present invention provides computer software adapted to 

25 identify adapter molecules for use in the method of the 

invention . 

As mentioned above, the production of fragments of 
said first nucleic acid molecule is achieved using a 
nuclease which has a cleavage site separate from its 

30 recognition site. In so doing, unique overhangs are 

created which reflect the sequence of that molecule. In 
a preferred feature, said nuclease is a class IP or IIS 
restriction enzyme or functional derivatives thereof. 
Such enzymes include enzymes produced synthetically 

35 through the fusion of appropriate domains to arrive at 
enzymes which cleave at a site distal to their 
recognition site. 
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These enzymes exhibit no specificity to the 
sequence that is cut and they can therefore generate 
overhangs with all types of base compositions. Cleavage 
with IIS enzymes result in overhangs of various lengths, 
5 e.g. from -5 to +6 bases in length. Preferably for 
performing the method of the invention, enzymes are 
chosen which generate 3-6, e.g. 4 base pair overhangs. 
Preferred enzymes for use in the invention include 
enzymes which produce 4 base overhangs at the 3 1 end: 

10 BstXI; 5 base overhangs at the 3 1 end: Alol , Bael , BplX , 
Bsp24I; 6 base overhangs at the 3 ' end: Cjel, CjePI, 
HaelV; 4 base overhangs at the 5 ? end: Acelll, Acc3 6I, 
Alw26I, AlwXI, Bbrll, Bbsl , Bbvl , BbvII, Bvbl6II, 
B1113 61, Bpil, BpuAI, Bsal , Bsc91I, JBseKI , BseXI , BswAI , 

15 BsmBI , BsmFI, Bso31I, Bsp423I, BspBS31I, BspIS4I, 

BspLUllIII, BspMI, BspST5I, BspTS514I, Bstl2I, Bst71I, 
BstBS32I, BstGZ53I, BstTSSI, BstOZ616I, BstPZ418I, 
Bco31I, BcoA41, £co044I, Esp3 1 , Fo,kI , Phal, SfaNI , 
Sth1321, StsI; and 5 base overhangs at the 5 1 end: Hgral 

20 Over 100 classes of IIS restriction endonucleases 

have been identified and there are large variations both 
with respect to substrate specificity and cleaving 
pattern. In addition, these enzymes have proved to be 
well suited to "module swapping" experiments so that one 

25 can create new enzymes for particular requirements 

(Huang-B, et al . ; J-Protein-Chem . 1996, 15(5):481-9, 
Bickle, T.A. ; 1993 in Nucleases (2nd edn) , Kim-YG et 
al.;PNAS 1994, 91:883-887). In these experiments the 
binding domain of transcription factor Spl was merged 

3 0 with the cleavage domain of Fokl to construct a class 
IIS restriction endonuclease that makes a 4 -base 
overhang with Spl sites. In other experiments a class 
IIS restriction endonuclease that cuts outside the 
binding sites of transcription factor Ult rabithorax was 

3 5 generated. Corresponding experiments have been 

conducted on class I enzymes. By merging the N- terminal 
part of the hsdS sub-unit of StyR 1241 (which recognizes 
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GAAN 6 RTCG) with the C-terminal part of the hsdS sub-unit 
of StyR 1241 (which recognizes TCAN-,RTTC) a new enzyme 
that recognizes the sequence GAAN 6 RTTC was constructed. 
Several other experiments have been carried out with 
similar success. Unlike in the case of ordinary class 
II enzymes, it is therefore reasonable to assume that a 
number of new IIS and IP restriction enzymes can be 
constructed and adapted to cloning requirements that may 
arise in the future. Very many combinations and 
variants of these enzymes can therefore be used 
according to the principles described herein. 

Generation of the single stranded regions on said 
first nucleic acid fragment may be achieved directly by 
cleavage of said first nucleic acid molecule with 
nucleases described herein without the development of 
intermediate molecules. This forms a preferred feature 
of the invention. Alternatively, indirect and more 
elaborate techniques may be used. For example, the 
first nucleic acid molecule or a fragment thereof may be 
"trimmed" using the nucleases described herein, in which 
linker molecules which carry the nuclease recognition 
site are bound to the first nucleic acid molecule or 
fragment thereof, and cleavage outside the recognition 
site results in cleavage within the first nucleic acid 
molecule or fragment thereof. This method is 
particularly useful since it takes advantage of the fact 
that T4 DNA ligase (and also other ligases) works well 
in most buffers used for restriction cutting. Ligation 
and cleavage can therefore be performed simultaneously 
in the same solution. Furthermore, this methods allows 
the generation of a unique overhang when the overhang 
generated by the first cleavage step is not unique. 

The trimming procedure may be initiated using an 
"initiation linker" that is addressed to an overhang on 
the first nucleic acid molecule or fragment thereof, 
e.g. after cleavage with one or more restriction 
endonucleases as described herein. As used herein, a 
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"linker" refers to a molecule which is similar to an 
"adapter 11 as described herein, except that the linker 
need only contain one single stranded region to allow 
binding to the molecule to be trimmed. Furthermore, the 
5 initiation linker contains one or more cleavage sites 

for nucleases that cleave outside their own recognition 
sequence, as described herein, for example BplI . The 
first nucleic acid molecule or fragment thereof should 
preferentially not contain cleavage sites for the IIS 

10 enzymes (s) used for the trimming procedure. Such 

cleavage sites may alternatively be inactivated prior* to 
the trimming procedure (e.g. by methylation) . 

Propagation linkers (if used) and a termination 
linker (wherein the latter may be an adapter as 

15 described herein), T4 DNA ligase and the IIS enzyme (s) 
used for the trimming may be added together with the 
initiation linker. Once the initiation linker has been 
ligated into position, cleavage may be effected 
resulting in the generation of an overhang within the 

20 first nucleic acid molecule or fragment thereof. If 
desired (ie. if further trimming is required), a 
propagation linker containing degenerate overhangs may 
be used to ligate with the overhang which has been 
generated. Since the linker will also carry an 

25 appropriate nuclease recognition site, cleavage will 

again produce a further cleavage site further upstream 
into the first nucleic acid molecule or fragment 
thereof. This process will continue until an overhang 
is generated that is complementary to one of the 

3 0 overhangs in the termination linker (or adapter as 

described herein) . This final linker will not itself 
have the nuclease recognition site and will therefore 
terminate trimming. As mentioned previously, this 
terminator linker may have an appropriate single 

35 stranded region for binding to the adapter used in the 

next step, or may itself be the adapter. An appropriate 
technique for performing the trimming method may be 
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found in Examples 4 and 9. 

The trimming method is preferably not performed 
with IIS enzymes belonging to the Bcgl class (e.g. BplI, 
Bael etc.) as the proteins are combined methylases and 
endonucleases and the methylase function may inactivate 
the binding sites on propagation linkers. Enzymes 
including FoJcI , Hgal etc. are therefore preferred 
enzymes for performing this method. If Bcgl class 
enzymes are to be used, the cof actor AdoMet should be 
replaced with AdoHcy, Sinefungine or other cofactors 
that can not function as methyl donors. 

Thus in a preferred feature the invention provides 
a method of removing the end terminus of a double 
stranded nucleic acid molecule with at least one single 
stranded region, comprising at least the steps of (i) 
binding (ie. ligated) a double stranded linker molecule 
containing a recognition site for a nuclease which 
cleaves outside its recognition site and a single 
stranded region complementary to the single stranded 
region on said double stranded nucleic acid molecule to 
said molecule and cleaving using said nuclease, thereby 
resulting in removal of one or more bases (e.g. 3-10, 
which may be in single or double stranded form, or a 
combination thereof) from the terminus of said nucleic 
acid molecule, (ii) optionally binding one or more 
propagation linkers which contain a recognition for a 
nuclease as described above and a degenerate single 
stranded region which binds to the overhang generated by 
the first or subsequent cleavage steps and cleaving 
using said nuclease, and (iii) adding a termination 
linker which binds to the single stranded region 
generated in steps i or ii. 

A similar technique may be used to remove unwanted 
sequences, e.g. contributed by the adapter after 
ligation of the first nucleic acid molecule fragment and 
second (or third) nucleic acid molecules. Various 
techniques may be used to remove the unwanted sequences, 
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e.g. if the sequence (e.g. a region from the adapter) 
contains a plant transposon sequence, this may be 
removed by adding necessary transposase enzymes to 
excise that sequence. Alternatively, the unwanted 
sequence may be removed by taking advantage of nuclease 
that cleave outside their recognition site. Thus, for 
example, adapters may be used which contain recognition 
sites for such enzymes which on cleavage (by appropriate 
selection of cleavage site sequences) , result in 
overhangs generated at two distinct cleavage sites which 
are complementary and thus allow concomitant excision of 
the intervening sequence. Examples of techniques for 
removing intervening sequences are shown in Example 
5. It will be appreciated that depending on the 
nuclease employed, it may be necessary to inactivate 
sites for that enzyme at locations other than adjacent 
to or within the intervening sequence. 

Thus, in a further preferred feature, adapters as 
used herein, additionally comprise one or more nuclease 
recognition and cleavage sites whereby arrangement of 
said sequences allows, on cleavage, generation of 
complementary single stranded regions wherein each one 
of said pair of single stranded regions is generated by 
cleavage at a distinct site. 

Depending on how the different steps in the method 
of the invention are performed, as described 
hereinafter, where necessary the second nucleic acid 
molecule, and/or the adapters may also be cleaved or 
digested to provide appropriate single stranded regions. 
In a preferred feature, the second nucleic acid molecule 
and/or the adapters are cleaved using the nucleases 
described above for generating the first nucleic acid 
molecule fragments. However, instead of cleavage with 
such nucleases, to generate appropriate single stranded 
regions and/or fragments f rom the second or third 
nucleic acid molecules or adapters, alternative 
techniques may be used. Thus for example other 
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restriction enzymes, non-specific nucleases or 
appropriate exonucleases or mechanical methods such as 
sonication or vortexing may be used. Where enzymes are 
employed, small volumes are preferably used during the 
5 reactions to increase efficiency. 

Ligation between the adapters and first, second and 
third nucleic acid molecules is achieved by any 
appropriate technique known in the art (see for example, 
Sambrook et al . , in "Molecular Cloning: A Laboratory 

10 Manual", 2nd Ed., Editor Chris Nolan, Cold Spring Harbor 
Laboratory Press, 1989) . For example, ligation may be 
achieved chemically or by use of appropriate naturally 
occurring ligases or variants thereof. Appropriate 
ligases which may be used include T4 DNA ligase, and 

15 thermostable ligases, such as Pfu, Taq, and TTH DNA 
ligase. Ligation may be prevented or allowed by 
controlling the phosphorylation state of the terminal 
bases e.g. by appropriate use of kinases or 
phosphatases. Appropriately large volumes may also be 

20 used to avoid intermolecular ligations. Thus, high 

adapter to vector/insert ratios may be used to avoid the 
vector or insert religating into its source material. 

Other techniques may be used to avoid or remove 
vectors which become religated or which do not cleave. 

25 For example the insert may be cloned into a selection 
marker that destroys the host bacteria unless it has 
been inactivated by the insert. Alternatively 
restriction cleaving using restriction enzymes specific 
for the fragment removed from the vector may be 

30 performed after the ligation step. Religated and 

uncleaved vectors would be cleaved in this step. Thus, 
the ideal cloning site is therefore one which contains 
many unique restriction sites that are removed upon 
insert ligation. Alternatively well-known techniques 

35 may be used for identifying the desired product, e.g. 
gel separation. 

If the steps of cleavage and ligation are performed 
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together, advantageously the insert and the vector into 
which it is inserted do not contain binding sites for 
the nuclease used. Similarly, it is advantageous if the 
fragment removed from the vector during the process of 
5 cloning contains binding sites for the nuclease. In 

that case, if that fragment religates with the vector it 
would be cleaved and thereby removed again. 

Once the first and second nucleic acid molecules 
(and optionally third nucleic acid molecules) or 

10 fragments thereof have been covalently attached, where 
necessary selection of appropriate products from any 
side-products may be performed. Selection may be 
performed by any techniques known in the art. 
Conveniently however, labelled probes may be used to 

15 identify sequences present only in the correct product, 
e.g. by probing for one or more sequences formed only 
through the union of the correct sequences, e.g. a probe 
directed to the junction between the adapter and the 
first, second or third nucleic acid sequences. 

20 Alternatively, the correct ligation may be detected by 
functional properties bestowed on the product through 
ligation, e.g. through the completion of sequences which 
allow expression of a particular product once the vector 
has been cloned into an appropriate host. 

25 Alternatively, selection may be performed by sequencing 
of the products which have been obtained, e.g. after 
amplification and/or transformation . 

Appropriate labels include any moieties which 
directly or indirectly allow detection and/or 

30 determination through the generation of a signal. 

Although many appropriate examples exist, examples 
include for example radiolabels, chemical labels (e.g. 
EtBr, TOTO, YOYO and other dyes) , chromophores or 
fluorophores (e.g. dyes such as fluorescein and 

35 rhodamine) , or reagents of high electron density such as 
ferritin, haemocyanin or colloidal gold. Alternatively, 
the label may be an enzyme, for example peroxidase or 
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alkaline phosphatase, wherein the presence of the enzyme 
is visualized by its interaction with a suitable entity, 
for example a substrate. 

As mentioned previously, one of the significant 
5 advantages which this method offers over known methods 
is the simplification of the techniques which are 
required. The steps described herein may be performed 
sequentially in separate tubes (e.g. when different 
enzymes are used and cross-reaction is undesirable) or 

10 in a limited number of steps. However, ideally, the 

reaction is performed in a single step. This can be 
achieved by appropriate selection of enzymes, adapters 
and second/third nucleic acid molecules, e.g. vectors. 
Thus for example the first nucleic acid molecule 

15 may be fragmented using a particular nuclease which is 
also used to fragment the second nucleic acid molecule. 
Since the enzyme used will cleave outside its 
recognition site, it would be expected that the 
resulting single stranded regions found on both the 

20 first and second nucleic acid molecule fragments will be 

unrelated. However, by appropriate choice of the 
mediating adapters (which may also be added providing 
they do not have restriction sites for that enzyme, or 
that cleavage at those sites reveals appropriate single 

2 5 stranded regions) , these unrelated sequences may be 

linked via the intermediacy of the adapters. Thus the 
entire reaction may be performed in a single step. 

It will also be appreciated that the adapters may 
be used to address the first nucleic acid fragments to 

3 0 different second nucleic acid fragments or cleavage 

sites. This would therefore allow different first 
nucleic acid molecule fragments to be directed and 
ligated to a particular vector or site within a vector. 
Thus multiple vectors (and corresponding appropriate 
3 5 adapters) may be used simultaneously and take up a 
single first nucleic acid molecule fragment. 

Alternatively, multiple fragments or copies of the 
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same fragment could be inserted at different sites 
within the same vector (in the latter case by the use of 
adapters with one common end but with the other end 
exhibiting variability to allow it to bind to different 
5 sites within the vector). In a further alternative, the 

first nucleic acid molecule fragments could be captured 
in the reverse orientation (again by appropriate adapter 
choice) and inserted into a vector, e.g. to produce 
antisense strands . 

10 Thus in a preferred embodiment the method described 

herein is performed in a single step. The ligation 
steps (ie. adapter to first nucleic acid molecule 
fragment and final ligation) may however be conducted 
separately once association of the relevant molecules 

15 has been achieved. In a further preferred embodiment, 

the invention provides a method of simultaneously 
attaching two or more fragments of the first nucleic 
acid molecule to different second nucleic acid molecules 
(or different termini thereof) . In cloning, this 

20 equates to the introducing of the two or more fragments 
into different sites in said second nucleic acid 
molecules or into different second nucleic acid 
molecules, e.g. into different sites within a vector or 
into different vectors. 

25 Thus the present invention provides methods of the 

invention in which two or more fragments of the first 
nucleic acid molecule are attached to different second 
and optionally third nucleic acid molecules, or 
different termini thereof. In a preferred feature, 

3 0 methods are provided wherein one or more fragments of 
said first nucleic acid molecule are attached via 
adapters to single stranded regions in said second 
nucleic acid molecule resulting from different cleavage 
events. As a further preferred feature, methods are 

35 provided wherein one or more fragments of said first 

nucleic acid molecule are attached via adapters to 
single stranded regions in two or more second nucleic 
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acid molecules. 

It will be appreciated that even more complex 
reactions may be envisaged in which multiple first 
nucleic acid molecules (e.g. 2 or more, e.g. 2-10) are 
5 simultaneously cleaved in the same reaction and their 

fragments bound to appropriate adapters which direct 
them to bind to different second nucleic acid molecules, 
e.g. different vectors or sites in vectors. 

Whilst the above described methods describe an 

10 especially simplified method, the above described 

effects may also be achieved by performing the method in 
discrete steps. This is particularly appropriate where 
different enzymes are used which would produce 
undesirable products in other molecules. Thus for 

15 example, different nuclease, such as restriction enzymes 

may be used to cleave the first and second nucleic acid 
molecules. In such cases, the molecules are cleaved 
separately, whereafter the enzymes are removed or 
inactivated before the fragments are mixed together with 

20 the adapters. Similarly, even if the same enzyme is 
used, if the adapters contain enzyme sensitive sites, 
the adapters could be appropriately modified to avoid 
reaction, e.g. by methylation, or the enzymes used to 
fragment the first and/or second nucleic acid molecules 

25 would be inactivated or removed (as mentioned above) 
prior to the addition of the adapters. 

Conveniently, inactivation of enzymes may be. 
achieved by incubation at at least 65°C, e.g. for 20 
minutes. Alternatively, appropriate techniques 

3 0 employing removal of the enzymes from the reaction, use 
of chelators, inhibitors etc. may be used to achieve 
inactivation . 

Once appropriate clones have been generated and 
selected these may be treated according to standard 

35 methods of amplification, transformation, replication, 
expression, sequencing, depending on the proposed 
application of the clones. Other aspects of the 
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invention thus include the nucleic acid molecule product 
of the method (ie. the nucleic acid molecule that is the 
[first nucleic acid molecule fragment] : [adapter] : [second 
nucleic acid molecule] product), such as cloning and 
5 expression vectors comprising that nucleic acid molecule 
product as well as transformed or transfected 
prokaryotic or eukaryotic host cells, or transgenic 
organisms containing a nucleic acid molecule produced 
according to the method of the invention. 

10 Appropriate expression vectors include appropriate 

control sequences such as for example t ranslat ional 
ie.g. start and stop condon, ribosomal binding sites) 
and transcriptional control elements (e.g. promoter- 
operator regions, termination stop sequences) linked in 

15 matching reading frame with the nucleic acid molecules 
of the invention. Appropriate expression systems are 
well known and documented in the art as well as methods 
for their introduction and expression in prokaryotic or 
eukaryotic cells or germ line or somatic cells to form 

20 transgenic animals. Appropriate expression vectors for 

transformation include bacteriophages and viruses, such 
as baculovirus, adenovirus and vaccinia viruses. 

Kits for performing the methods described herein 
form a preferred aspect of the invention. Thus viewed 

25 from a further aspect the present invention provides a 

kit for attaching a first nucleic acid molecule fragment 
to a second nucleic acid molecule or a fragment thereof 
comprising at least (i) one or more adapters as 
described hereinbefore or means for producing such 

30 adapters, (ii) the second nucleic acid molecule and 

(iii) a nuclease which cleaves outside its recognition 
site, wherein the terminus of one of said adapters has a 
single stranded region complementary to a single 
stranded region generated on said second nucleic acid 

35 molecule after cleavage with said nuclease. 

Preferably said kit comprises a library of 
oligonucleotides, e.g. as described herein, particularly 
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as described in Example 3, from which appropriate 
adapters may be generated. The library of 
oligonucleotides as described herein forms a further 
preferred feature of the invention. Thus for example 
:' said library may comprise a plurality of 

oligonucleotides comprising 1) a plurality of 
oligonucleotides of the formula XNNNNN wherein X is one 
or more bases (wherein said bases are as described 
hereinbefore) and is invariant in all of said 
oligonucleotides and each N is a base at the 5' end 
which is varied in the different oligonucleotides, ie . 
co produce 1024 variants, 2) a plurality of 
oligonucleotides of the formula X 1 NNNN wherein X 1 is 
complementary to X and is invariant in all of said 
25 oligonucleotides and each N is a base at the 5' end as 
described hereinbefore, 3) a plurality of 

oligonucleotides of the formula YNNNNN wherein Y, which 
is not the same as X, is one or more bases (wherein said 
bases are as described hereinbefore) and is invariant in 

20 all of said oligonucleotides and each N is a base at the 
3' end as described hereinbefore, and 4) a plurality of 
oligonucleotides of the formula Y 1 NNNNNN wherein Y' is 
complementary to Y and is invariant in all of said 
oligonucleotides and each N is a base at the 3' as 

25 described hereinbefore. 

Optionally the kit may contain other appropriate 
components selected from the list including ligases, 
enzymes necessary for inactivation and activation of 
restriction or ligation sites, primers for amplification 

30 and/or appropriate enzymes, buffers and solutions, and a 
data carrier containing a computer program to assist in 
the selection of oligonucleotides from the above 
mentioned library. The use of such kits for performing 
the method of the invention form further aspects of the 

35 invention . 

The above described method may be adapted to 
combine multiple first, second, third etc. nucleic acid 
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molecules as described below. In this method multiple 
fragments are combined by appropriate selection of che 
single stranded regions which appeal - at their ends. 
This has application in the production of specific 
5 sequences for biological purposes, but has particular 

utility in the production of nucleic acid molecule 
chains in which the units making up the chains each 
denotes a unit of information, ie. the chains may be 
used to store information, as will be described in more 

10 detail below. As used herein "chain" refers to a serial 
arrangement of' fragments as described herein. Such 
chains are preferably linear and include branched and 
unbranched fragment sequences. Thus, for example, 
branched DNA fragments may be used to provide chains 

15 with a branched arrangement of fragments. 

To produce nucleic acid molecule chains with 
different unit fragments, ie . fragment chains the 
following method may be used. Firstly it is necessary 
to generate fragments which have overhangs at either 

20 end, to allow them to bind to one another. (The 
ultimate 3' and 5' fragments may however have an 
overhang at only the end which will become attached to 
internal fragments.) As will be described in more 
details below, for certain applications appropriate 

25 oligonucleotides may be derived from libraries in which 
the members exhibit variability in at least some of 
their bases. If libraries are to be produced in which 
the members are double stranded, it will be appreciated 
that the number of members in such a library could be 

30 rather high. This can however effectively be reduced by 

using a smaller number of smaller building blocks. 

One strategy is to make two single - stranded 
oligonucleotides using conventional techniques. In the 
example described above (6 base double stranded linker 

35 and 3 base overhangs at either end) , oligonucleotides 
having a region of 6 bases which complement each other 
and so allow hybridization may be used. Since not all 
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of the molecules are involved in the hybridization, 
single stranded regions extend beyond the hybridizing 
region thus creating single stranded regions. 
Conveniently the number of required library members may 
5 be reduced even further if repeat sequences appear with 
frequency in the fragment chain. This will be described 
in more detail below. 

Once the appropriate double stranded chain units 
(ie. fragments) have been created they may be ligated 

10 together in the same solution, providing the different 
overhangs present on the sequences are unique. 

Thus in a further aspect, the present invention 
provides a method of synthesizing a double stranded 
nucleic acid molecule comprising at least the steps of: 

15 1) generating n double stranded nucleic acid 

fragments, wherein at least n-2 fragments have single 
stranded regions at both termini and 2 fragments have 
single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 

20 to (n-1) other single stranded regions, thereby 
producing (n-1) complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 
binding of said complementary pairs of single stranded 

2 5 regions, and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments. 

The terms "nucleic acid molecule", "single stranded 
30 regions", "complementary", "binding" and "ligating" are 
as described hereinbefore. 

In step 1) reference is made to (n-1) single 
stranded regions complementary to (n-1) "other" single 
stranded regions. This describes two families of single 
35 stranded regions, which together comprise 2 (n-1) 

members, forming n-1 pairs. Thus "other" refers to 
single stranded regions in the second family which are 
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not present in the first family. 

"Contacting" as used herein refers to bring 
together the double stranded fragments under conditions 
which are conducive to association of the complementary 
S single stranded regions. Depending on the method used, 

this may ultimately allow ligation of the fragments 
carrying those regions. It should however be noted that 
the fragments may be linked by methods other than 
ligation. For example PCR may be used with appropriate 

I ■ primers, e.g. pairs of primers. 

Simultaneous or consecutive contacting and/or 
ligation refers to the possibility of adding the 
fragments individually or in groups to a growing chain 
or simultaneously adding all n fragments together, 

15 wherein ligation may be performed after each addition or 

once all n fragments have been combined. Preferably 
ligation is effected once all fragments have been 
combined . 

"Fragments" as used herein are as defined herein 
20 before, but preferably are shorter in length. Thus 

fragments are preferably greater than 6 bases in length 
(wherein said length refers to the length of each single 
stranded oligonucleotide making up the fragment which 
may differ slightly in length from one another), e.g. 
25 between 6 and 50 bases, e.g. from 8 to 25 bases. 

As referred to herein, "n" is an integer of at 
least 4, for example at least 10 or 100, e.g. between 25 
and 200 . 

Preferably, as mentioned above, the fragments are 
30 generated by the use of single stranded oligonucleotides 

to generate appropriate double stranded molecules. 

Of particular interest in such methods is the 
production of fragment chains that may be used to store 
information in the form of code which may readily be 
35 accessed. 

There is currently a great need for storing 
information for different purposes {e.g. computer 
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software, music, films, databases etc.). It has 
therefore been imperative to find efficient storage 
media, resulting in the development of CD ROMs, DVD 
technology etc. Nucleic acid molecules offer far more 
5 efficient methods for storing information and have 

several advantages over storage methods currently in 
use. For example, the storage capacity of nucleic acid 
molecules is vast. In principle, a test-tube containing 
DNA molecules may contain as much information as several 

10 million CD ROMs or more. Nucleic acid may be copied 

quickly and efficiently using natural systems which are 
greatly enhanced by techniques which have been developed 
such as PCR, LCR etc. When stored appropriately, 
nucleic acid molecules may be preserved for extremely 

15 lengthy periods. Naturally existing tools for 

manipulation of nucleic molecules are already available 
for processing of the molecules, e.g. polymerases, 
restriction enzymes, transcription factors, ribosomes 
etc. The nucleic acid molecules may also have catalytic 

20 properties. 

Furthermore, nucleic acid molecules may be used as 
secure systems since they may be made such that they are 
not readily copied, unlike copying of current storage 
systems, e.g. CDs etc., which is increasingly prevalent. 

25 Previously however, it was not possible to take 

advantage of the enormous potential offered by nucleic 
acid molecules due to the absence of any effective 
methods for writing DNA messages or reading DNA 
messages. The above described method provides methods 

30 which overcome this problem allowing the rapid synthesis 
of large DNA molecules and methods of rapidly and 
efficiently scanning those molecules to retrieve the 
information . 

The key to effective retrieve of information 

35 encoded by the nucleic acid molecules produced according 
to the method described herein, is the expansion of the 
information providing unit in the molecule. In nature 
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and in methods used previously, each base in the 
sequence has an individual informational content. 
Indeed methods have been described in which a single 
base may signify more than a single informational unit, 
5 e.g in binary code, the bases A= n 00" , C=" 01" , G="10" and 
T=" 11" . Whilst- this has advantages insofar as 
significant amounts of information can be contained in a 
single molecule, the system has serious drawbacks as it 
requires writing and reading methods in which individual 

10 bases may be attached and discriminated. 

In a preferred method of the invention therefore, 
information units are provided which are not single 
bases, but are instead short sequences. The techniques 
described above allow the rapid production of such 

15 chains and the information may be readily accessed. 

Thus units representing coded information may be 
generated and read. Each information unit may therefore 
represent an element of code, in which the code may for 
example be alphanumeric code or a simpler representation 

20 such as binary code. In each case it is necessary for 
individual elements of the code, e.g. "a", "b", "c", 
"1", "0" etc. to be represented by an individualized and 
specific sequence . 

As used herein "information units" refer to 

25 discrete short sequences which represent a single piece 
of information, e.g. one or more (ie. combinations 
thereof) elements of a code. 

"Elements" of code, as mentioned above, refer to 
the different members making up a code such as binary or 

30 alphanumeric code. 

Thus, in a preferred embodiment of the method of 
the invention, the fragments which are linked together 
comprise regions representing a unit of information 
corresponding to one or more code elements. Preferably 

35 the code is alphanumeric. Especially preferably the 

code is binary. Thus for example, considering a binary 
system of information capture, if one wishes to produce 
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chains consisting of "0", "1" fragments, appropriate 
sequence combinations may be attributed to "0" or " 1 " . 

Conveniently each of said one or more code elements 
(together) has the formula 
5 (X), # 
wherein 

X is a nucleotide A, T, G, C or a derivative 
thereof which allows complementary binding and may be 
the same or different at each position, and 
10 a is an integer greater than 2, e.g. greater than 

4, for example' from 2 to 20, preferably from 4 to 10, 
e.g. 6 to 8 , 

wherein (X) e is different for each one or more code 
elements. 

15 Especially preferably, in the case of binary code, 

the code elements "1" and 0 may have the formulae: 

"0"= (20a and (Y) b , 
wherein 

20 (X) a and (Y) b are not identical, 

X and Y are each a nucleotide A, T, G, C or a 
derivative thereof which allows complementary binding 
and may be the same or different at each position, and 
a and b are integers greater than 2, e.g. greater 
25 than 4, for example from 2 to 20, preferably from 4 to 
10 , e.g. 6 to 8 . 

As referred to herein, a "derivative" which is 
capable of complementary binding refers to a nucleotide 
analog or variant which is capable of binding to a 
30 nucleotide present in a complementary strand, and 

includes in particular naturally occurring or synthetic 
variants of nucleotides, e.g uracil or methylated, 
amidated nucleotides etc . 

In its simplest and preferred form, X and Y are the 
35 same at each position, e.g. "0"= GGGGGGGG and 

" 1 " =AAAAAAAA . However, repeat sequences such as [AC] 6 A 
or [GT] *A may be used. The code sequence may also have a 
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functional property, e.g. it may be an integration 
element such as AttPl or AttP2. 

It will however be appreciated that the sequences 
described above may also denote more than a single code 
5 element. Thus for example the information unit may 

denote 2 or more code elements, e.g. from 2 to 32 
element, preferably from 2 to 4 code elements. If for 
example binary code is considered, each information unit 
may refer to "01" or "00" or "11" or "10". 

10 In the method described herein, chains comprising 

such features may be prepared as follows. To produce a 
chain with for example 8 0/1 fragments, eight "0" 
starting fragments with different overhangs and 8 "1" 
starting fragments with different overhangs are 

15 generated as illustrated in Figure 2. In this case "0" 

fragments consist of the sequence GGGGGGGG , although 
this could be replaced by other sequences. In addition 
the fragments are synthesized such that they have unique 
overhangs such that they may only be ligated at one 

20 position. Thus, the fragments for position 1 in the 

chain are produced such that they have an overhang which 
is complemented by one of the overhangs in the fragments 
for position 2. Thus, the position 2 fragments are 
synthesized such that they can bind to position 1 

25 fragments. Similarly position 3 fragments may only bind 

to position 2 fragments at one of their termini and 
position 4 fragments at the other terminus and so forth. 
These fragments are stored separately- In order to build 
up a chain, selection is made from one of the two 

30 alternative for each position such that an appropriate 

binary chain is produced. 

Thus, in the scheme outlined above, to produce a 
fragment chain which represents a chain 01001011, 0 " 
fragments from positions 1, 3, 4 and 6 are mixed with 

35 "1" fragments from positions 2, 5, 7 and 8. If the 

fragments are then ligated together by adding ligase or 
using other ligation methods mentioned previously, the 
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above described chain will be produced. As will be 
appreciated, this chain could also be achieved using for 
example only 4 fragments if the information unit carried 
on each fragment denoted 2 code elements. 
5 It is furthermore possible to combine intermediate 

fragment chains (e.g. containing at least 4 fragments) 
with other fragment chains, which providing appropriate 
overhangs exist at their termini may be ligated together 
to form composite fragment chains. Thus, several cycles 
10 could be conducted in parallel and the products 

combined. In the method shown in Figure 2, the end 
fragments have blunt ends, but clearly, appropriate 
fragments could be used that similarly have overhangs at 
the termini . 

15 An appropriate technique for producing 8 fragment 

chains, each containing 8 fragments which can then be 
ligated together is illustrated in Figure 3. For 
fragment chain 1, end fragments are used such that it is 
possible for the completed fragment chain to ligate to 

20 fragment chain 2 and so on. These may then be combined 

to produce a 64 fragment chain. Similarly, 8 such 
fragment chains may be combined to produce fragment 
chains comprising 512 fragments . 

As will be appreciated, as with the production of 

25 shorter chains, the step of ligation, when performed, is 

conveniently effected once all the fragment chains have 
been combined. However, the step of ligation may be 
performed sequentially if desired on addition of each 
subsequent fragment chain. 

30 To combine 8 binary fragments per cycle, 16 

different starting fragments are required, representing 
the different "0", "1" alternatives at each position. 
To make a chain of 64 fragments using two cycles, ie . to 
produce 8 chains with 8 fragments which are then 

35 ligated, only 16+ (4x7) =44 starting fragments are 

required. Thus, the number of different starting 
fragments required reflects an almost linear increase in 
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contrast to the combinations of the fragment chains 
which can be produced which increases exponentially with 
the number of cycles. As a consequence, very long 
fragment chains may be produced with a relatively small 
5 number of starting fragments. 

Of course, as mentioned previously, intermediate 
chains longer or shorter than 8 may be produced. Since 
a large number of permutations exist in the overhang 
region, more starting fragments may be used thus 

10 allowing larger fragments to be built up in a single 

cycle. Thus, the number of cycles necessary to produce 
long chains may be reduced. 

Small fragment chains produced according to the 
methods described herein may also be attached together 

15 by using variations of the techniques described herein. 
For example, complementary primer pairs may be used to 
link the various chains'- as described in Example 8 . In 
this technique, amplification of the fragment chains is 
achieved using different primer pairs. The second 

20 primer in primer pair 1 is complementary to the first 
primer in primer pair 2 and the second primer in that 
pair is complementary to the first primer in primer pair 
3 and so on. PCR reactions are then performed which 
produce products which in single stranded form are able 

25 to bind to one another through their complementary ends 
introduced by the primer pairs. These may then be 
ligated together. 

Alternatively, fragment chains prepared by the 
methods described herein may be amplified with a primer 

30 which contains a restriction site to a nuclease which 
cleaves outside its recognition site. These 
amplification products are then digested with that 
nuclease to produce non-palindromic overhangs in the end 
of each fragment chain. By appropriate sequence 

35 selection (e.g. in the primer or fragments which are 
used) the overhangs which are generated allow the 
different fragment chains to be combined in order. 
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In a preferred aspect therefore, the invention 
provides a method of synthesizing a double stranded 
nucleic acid molecule comprising at least the steps of: 

1) generating fragment chains according to the method 
described hereinbefore ; 

2) optionally generating single stranded regions at 
the end of said fragment chains, wherein said single 
stranded regions are complementary to other single 
stranded regions on said fragment chains thus forming 
complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, 
simultaneously or consecutively, to effect binding of 
said complementary pairs of single stranded regions. 

Optionally said chains are ligated together, 
however, alternative techniques may be use to form the 
ultimate chain, e.g. PCR may be used as described 
herein. 

Preferably intermediate fragment chains are between 

4 and 20 fragments in length, e.g. 5 to 10, and between 

5 and 50 such fragment chains are combined e.g. between 
10 and 20. 

Conveniently fragments to be used in the method of 
the invention are contained within libraries. Methods 
of producing the fragments which make up the library are 
well known in the art. For example a series of 
oligonucleotides may be produced which comprise two 
portions. A first portion which will form an overhang 
at one end and a second portion which will effect 
binding to a complementary oligonucleotide and which 
contains within that portion the information unit. By 
producing common hybridizing portions and variant 
overhangs, a series of double stranded oligonucleotides 
for one or more code elements (denoted by at least a 
part of the hybridizing portion) are created. This 
provides a library for one (or a combination of) code 
elements. Different libraries may be created for 
different code elements (or combinations thereof) , by 
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appropriate alteration of the information unit, ie . the 
sequence in the hybridizing portion. 

Conveniently for use in the invention, these 
different double stranded oligonucleotides are arranged 
5 in 2 dimensional arrays such that in one dimension 

consecutive positions within the ultimate fragment are 
indicated and in the second dimension the possible code 
element (or combinations thereof) are provided. In the 
simplest case, in binary code, in which "0" and 11 1" are 

10 represented by different sequences, the first dimension 
would comprise fragments for each position of the 
proposed fragment and the second dimension would have 
only 2 variants ("0" and "1" ). This may be viewed as a 
single library or two libraries, ie . the "0" or "l" 

15 libraries. Once these libraries are produced, fragment 

chains with any desired order of fragments may be 
readily produced. 

In order to appropriately direct library members to 
their correct site or well ( ie . the library may be 

20 comprised of separate solid supports, or a solid support 

with different addresses, e.g. wells, or different wells 
containing different solutions) , any appropriate sorting 
technique may be used. This sorting may be achieved by 
virtue of the process used for production of the library 

25 members, or sorting may be achieved by an appropriate 
technique, e.g. by binding to complementary 
oligonucleotides at the relevant library site. 

Appropriate solid supports suitable for attaching 
library members are well known in the art and widely 

30 described in the literature and generally speaking, the 

solid support may be any of the well-known supports or 
matrices which are currently widely used or proposed for 
immobilization, separation etc. in chemical or 
biochemical procedures. Thus for example, the 

35 immobilizing moieties may take the form of beads, 

particles, sheets, gels, filters, membranes, microfibre 
strips, tubes or plates, fibres or capillaries, made for 
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example of a polymeric material e.g. agarose, cellulose, 
alginate, teflon, latex or polystyrene. Particulate 
materials, e.g. beads, are generally preferred. 
Conveniently, the immobilizing moiety may comprise 
5 magnetic particles, such as superparamagnetic particles. 

In a preferred embodiment, plates or sheets are 
used to allow fixation of molecules in linear 
arrangement. The plates may also comprise walls 
perpendicular to the plate on which molecules may be 

10 attached. Attachment to the solid support may be 

performed directly or indirectly and the technique which 
is used will depend on whether the molecule to be 
attached is an oligonucleotide for fixing the library 
member or the library member itself. For attaching the 

15 library members directly, ie . not via binding to an 
oligonucleotide, conveniently attachment may be 
performed indirectly by the use of an attachment moiety 
carried on the nucleic acid molecules and/or solid 
support. Thus for example, a pair of affinity binding 

20 partners may be used, such as avidin, streptavidin or 

biotin, DNA or DNA binding protein (e.g. either the lac 
I repressor protein or the lac operator sequence to 
which it binds) , antibodies (which may be mono- or 
polyclonal) , antibody fragments or the epitopes or 

25 haptens of antibodies. In these cases, one partner of 
the binding pair is attached to (or is inherently part 
of) the solid support and the other partner is attached 
to (or is inherently part of) the nucleic acid 
molecules. Alternatively, techniques of direct 

30 attachment may be used such as for example if a filter 
is used, attachment may be performed by UV- induced 
crosslinking . When attaching DNA fragments, the natural 
propensity of DNA to adhere to glass may also be used. 
Oligonucleotides to be used for capture of the 

35 library members may be attached to the solid support via 

the use of appropriate functional groups on the solid 
support . 
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Attachment of appropriate functional groups to the 
solid support may be performed by methods well known in 
the art, which include for example, attachment through 
hydroxyl , carboxyl , aldehyde or amino groups which may 
5 be provided by treating the solid support to provide 
suitable surface coatings. Attachment of appropriate 
functional groups to the nucleic acid molecules of the 
invention may be performed by ligation or introduced 
during synthesis or amplification, for example using 

10 primers carrying an appropriate moiety, such as biotin 
or a particular sequence for capture. 

In a further aspect therefore the present invention 
provides a library of fragments as defined herein' 
comprising (n) m fragments, wherein n is as defined 

15 hereinbefore and corresponds to the length of chain that 
said library may produce, and m is an integer 
corresponding to the number of possible code elements or 
combinations thereof, such that fragments corresponding 
to all possible code elements for each position in the 

20 final chain are provided. 

Portions of said libraries in one dimension, ie . 
comprising n fragments for only a single code element 
(or combinations thereof) or comprising m fragments 
representing all code elements (or combinations thereof) 

25 for a single position on the chain, form further aspects 
of the invention. 

Appropriate mixing may be achieved by automation. 
For example in the case of "0", "1" fragments, the 
correct combination of these elements is the critical 

30 step in terms of resource- and time-consumption. This 
method is described in more detail in Example 2 . In 
particular, the procedure may be miniaturised providing 
appropriate amplifying methods (such as cloning and/or 
PGR) are employed in the last step. Thus, techniques 

35 using technology such as sorting using flow cytometers 

may be employed as described in Figure 4C. Such sorting 
procedures are well established and are able to sort 
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approximately 5-30000 droplets per second for standard 
equipment, but up to 300000 droplets per second for the 
most advance cytometers. 

As mentioned previously, it is possible that each 
5 fragment may denote more than a single code element. If 

for example, each fragment denotes 5 code elements, 
using existing technology and a library of 32x100 
library components, if 3200 containers were connected to 
a sorting device illustrated in Figure 4C, it should be 
10 possible to write several thousand chains with 500 code 

elements per second. Clearly, a method which can 
generate nucleic acid sequences with such rapidity 
offers significant advantages over known methods in the 
art . 

15 The nucleic acid molecule (ie. the fragment chain) 

produced according to the above described method and the 
single stranded molecules thereof comprise further 
features of the invention. These molecules may as 
appropriate be included into a vector, as described 

20 hereinbefore. 

Once produced, the fragment chains, in double 
stranded or single stranded form, may be used in various 
applications, as described hereinafter. One application 
of particular utility is to store information. In such 

25 cases appropriate means of reading the information 

stored in those chains is required. In some 
applications, fragment chains may be appropriately 
addressed to particular sites, e.g. through binding to 
oligonucleotides carried on solid supports which are 

3 0 complementary to overhangs on one terminus of the 

fragment chains. Alternatively appropriate 
antibody/antigen, or DNArprotein recognition systems may 
be used. Thus, information stored in molecules 
addressed in this way, or in solution may then be 

35 accessed. 

Co-pending application PCT/GB99/04417 , a copy of 
which is appended hereto, describes appropriate 
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techniques for addressing and reading information 
contained in nucleic acid molecules. Of particular note 
in this respect are techniques in which fluorescence of 
probes carrying fluorescent labels directed to 
r particular sequences are detected. In such techniques, 

probes, carrying labels as described hereinbefore, may 
be directed to particular fragment regions, particularly 
:o regions denoting code elements. The signals 
generated (directly or indirectly) by those labels may 
then be detected and the code element thereby 
identified. If a simple binary system is used only 2 
discrete labels are required and their pattern of 
binding may be determined. Alternatively, if a more 
complex code is reflected in the fragment chains, 

IE correspondingly more discrete labels are required for 

unambiguous detection . 

Thus in a further aspect, the present invention 
provides, a method of identifying the code elements 
contained in a nucleic acid molecule prepared as 

20 described hereinbefore (ie. fragment chain) wherein a 
-robe , carrying a signalling means (e.g. a label), 
specific co one or more code elements, is bound to said 
nucleic acid molecule and a signal generated by said 
signalling means is detected, whereby said one or more 

25 code elements may be identified. 

Preferably said signalling means is a label as 
described hereinbefore. 

A "probe" as referred to herein refers to an 
appropriate nucleic acid molecule, e.g. made up of DNA, 

30 RNA or PNA sequences, or hybrids thereof, which is able 
to bind to the target nucleic acid molecule (which may 
be single or double stranded) through specific 
interactions, ie . is specific to particular code 
elements, e.g. through complementary binding to a 

35 particular sequence. Probes may be any convenient 

length, to allow specific binding, e.g. in the order of 
5 to 50 bases, preferably 8 to 20 bases in length. 
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A "signalling means" as used herein refers tc a 
means for generating a signal directly or indireczly. A 
signal may be any physical or chemical property which 
may be detected, e.g. presence of a particular product, 
5 colour, fluorescence, radiation, magnetism, 

paramagnetism, electric charge, size, or volume. 
Preferably the label is a fluorophore whose florescence 
is detected. In such cases fluorescence scanners may be 
used for detection of the label and thereby 

10 identification of the code elements. 

A particular code element or combination of 
elements may be identified by the appearance of a 
particular signal. Clearly the position of each signal 
is crucial to determining the sequence of the code 

15 elements. As a consequence methods in which positional 
information (absolute or relative) may be obtained 
should be used. Appropriate techniques, e.g. using 
target molecules which have been attached to a solid 
support at one end, are described in co-pending 

20 application PCT/GB99/04417 . 

A number of applications exist for the fragment 
chains once produced in nano and pico-technology, inter 
alia for example by stretching of the fragment chains by 
means of a stream of liquid, electricity or other 

25 technology and using them as templates for nano and 

pico-structures . The products may also be used to label 
products which can then be screened to establish their 
identity. Alternatively, the molecules may be used to 
store information, e.g. pictures, text, music or as data 

30 storage in DNA computers. The rapid production and 

reading techniques makes such applications possible for 
the f irst time . 

Kits for performing the methods described above 
form a preferred aspect of the invention. Thus viewed 

35 from a further aspect the present invention provides a 

kit for synthesizing a double stranded nucleic acid 
molecule comprising at least n double stranded nucleic 
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acid fragments, wherein at least n-2 fragments have 
single stranded regions at both termini and 2 fragments 
have single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 
5 to (n-1) other single stranded regions, thereby 

producing (n-1) complementary pairs. Preferably in 
excess of n fragments are supplied for production of a 
chain of n fragments, such that selection of appropriate 
fragments for different positions is possible. Thus in 

10 a preferred feature said kit comprises (n ) m fragments , 
wherein n is as defined hereinbefore, and m is an 
integer corresponding to the number of possible 
variations, e.g. unique sequences or code elements or 
combinations thereof, such that fragments corresponding 

15 to all possible sequences or code elements for each 

position in the final chain are provided. Preferably 
these fragments are provided in appropriate libraries 
arranged with reference to their position within the 
fragment chain and the code element (s) which they 

20 represent, such that desired fragments may be readily 

selected from the array. 

Optionally the kit may contain other appropriate 
components selected from the list including ligases, 
enzymes necessary for inactivation and activation of 

25 restriction or ligation sites, primers for amplification 
and/or appropriate enzymes, buffers and solutions. The 
use of such kits for performing the method of the 
invention form further aspects of the invention. 

30 The following examples are given by way of illustration 

only in which the Figures referred to are as follows: 

Figure 1 shows a schematic representation of how the 
method of the invention may be used to introduce an 
35 insert into a vector, in which the insert is cleaved 

from the first nucleic acid molecule, associated with 
adapters and ligated thereto and then ligated into the 
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vector ; 

Figure 2 shows the production of a fragment chain using 
8 "0" and "1" starting fragments with different 
overhangs ; 

5 Figure 3 shows the production of a 64 fragment chain in 

which 8 chains are produced comprising 8 fragments each, 
in which the termini of chains 1 and 2, and 2 and 3 etc. 
are complementary such that they may be ligated 
together ; 

10 Figure 4 shows 3 techniques for mixing "0", " 1 " 

fragments from a library of fragments ordered for each 
position, in which in A) appropriate fragments are 
selected by aspiration from appropriate wells, B) 
appropriate fragments are released from the library 

15 wells and C) a flow cytometer is used to direct 
appropriate droplets to the mixing chamber; 
Figure 5 shows PCR amplification of signal chain 
1-0-1-0-0 using SP6 and T7 primers. Lane 1: 1 /xg of 1 kb 
DNA ladder (Gibco BRL) , Lane 2: 10 fil of PCR amplified 

20 fragment chain DNA using SPG and T7 primers. Lane 3: 

Same as lane 2 except for the use of SP6 and T7-Cy5 
primers ; and 

Figure 6 shows the use of primer pairs during the 
process of amplification to join together fragment 
25 chains. 
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EXAMPLE 1: CLONING OF AN INSERT INTO A VECTOR. FOR 
EXAMPLE FROM PHIX174 INTO PUC19 

A general procedure to be followed using IIS and IP 
5 enzymes to achieve cloning involves the use of a cloning 
vector which has the following characteristics: 
1) A multiple cloning site located within a gene 
(lacZ, ccdB or other) that allows the detection of 
successful insertion . 

10 2) The multiple cloning site contains two flanking 

Hgal sites that generates overhangs that differ from 
other Hgal generated overhangs elsewhere in the vector. 
The orientation of the Hgal sites ensures excision of 
its sites from the vector part during digestion. To 

15 minimize background due to undigested plasmids, several 
Hgal sites and other suitable restriction enzyme sites 
are included in the MCS ... The restriction enzymes are 
chosen such that they cleave well in Hgal buffer and do 
not have other sites in the vector. 

20 

The donor plasmid is cut with the appropriate set of IIS 
and/or IP enzymes. Adapters are used to specify the 
fragment to be sub-cloned into the vector, by the use of 
appropriate single stranded regions on the adapters to 
25 the overhangs generated on the insert. This results in 
the molecule: vector - adapter 1 - insert (e.g. PhiX174 
gene) - adapter II - vector. 

This method is illustrated for insertion of a PhiX174 
30 insert into a vector, e.g. pUC19 . An Hgal site in a 

pUC19 plasmid is chosen randomly to be our "polylinker" 
while different genes and gene combinations from the 
PhiX174 genome is used as "inserts' 1 . 

35 Genomes are organized in PhiX174 as illustrated below 

which shows the position of genes A, B, C and E relative 
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to one another: 
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In the above, gene B is located inside gene A while gene 
10 C is slightly overlapping with gene A (by 3 base pairs) . 
Gene D and K are located in the same area as gene C and 
E, but are not shown. This genome area contains 9 Bbvl 
sites as shown on the bottom row, in which the overhang 
pairs that will be generated by cutting with BJbvI are as 
15 follows with the base pair position indicated in 

brackets: 1-CAGC/GTCG (3798), 2-CTGC/GACG (4215), 3- 
ACGG/TGCC (4398), 3 -GCAT/CGTA (4677) , 5 - CTAT / GAT A 
(5049) , 6 -GAGA/CTCT (158) , 7-GAGC/CTCG (547) , 8- 
CAAC/GTTG (624) , 9-CCAT/GGTA (892) . The parts of the 
20 PhiX174 genome not shown contain 5 more BbvT sites: 10- 
TACC/ATGG (1488) , 11-TACC/ATGG (1592) , 12-CTAC/GATG 
(1639), 13-GCAC/CGTG (3294), 14-CTAA/GATT (3297). Of 
these only 12 give rise to non- ident ical overhangs 
whilst 2 result in identical overhangs. 

25 

When Hgal is used to cleave pUC19, 4 non- ident ical sites 
are cleaved, giving rise to 8 non- ident ical overhangs. 
These are: 1 - CTGCC/GACGG (573), 2 - TTCTC / AAGAG (1131), 3- 
CAAGG/GTTCC (1881) , 4 - AGACT / TCTG A (2459) . 

30 

Method : 

To sub-clone gene B from Bacteriophage PhiX174 into the 
designed vector, the following protocol is used: 

35 1) 2^g of PhiX174 DNA is digested with 2 U of Bbvl (NEB) 

in IX buffer 2 (NEB) , water added to a volume of 20//1, 
for 1 hr at 37°C. Sbvl is then heat inactivated at 65°C 



BNSDOCID: <WO 01008 16A1_I_> 



WO 01/00816 



PCT/GB00/02512 



- 48 - 

for 20 minutes. 

2) 2(iq of vector (e.g. pUC19) is digested with 2 U Hgal 
(NEB) in IX buffer 1 (NEB) , water added to a volume of 
20,1*1, for 1 hr at 37°C. Hgal is then heat inactivated at 

5 65°C for 20 minutes. 

3) The adapters are made in separate tubes by mixing 
two and two oligonucleotides (selected to obtain the 
desired product, ie . particular gene(s), in 
forward/reverse orientation) and allowing annealing. 

10 4) Gfil of the cleavage reaction of PhiX174 is mixed 

with 3/il of the cleavage reaction of the vector and 
ligated in the presence of 5-50 pmol of each adaptor, 
2-10 U/pil T4 DNA Ligase (NEB) , IX ligase buffer (NEB) 
and 5% Polyethylene glycol 8000, water added to a volume 

15 of 30/il, at 25°C for 1 hr . 

5) Conventional methods are used to transform bacteria. 

6) The colonies are then counted and some of them are 
then picked for further analysis (sequencing, and the 
like) . 

20 

Materials : 

Oligonucleotides used to address PhiX174 overhangs: 
SJbvI overhang la: 

5 ' - CGA GCG CCT CCA GTG CAG CGG AG 
25 Bbvl overhang 5a: 

5 » - TATC GCG CCT CCA GTG CAG CGG AG 
SJbvI overhang 6b: 

5 1 - CTCT GCG CCT CCA GTG CAG CGG AG 
Bjbvl overhang 6 (delC) : 
3 0 5»- CTCT CTC CGC TGC ACT GGA GGC GC 

Sjbvl overhang 7a : 

5 ' - CAAC GCG CCT CCA GTG CAG CGG AG 
Bjbvl overhang 9b: 

5 ' - GGTA G'CG CCT CCA GTG CAG CGG AG 

35 
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Oligonucleotides used to address pUC19 overhangs: 
Cloning site la 

5 ' - AAGAG CTC CGC TGC ACT GGA GGC GC 
Cloning site lb 
5 5 1 - CTCTT CTC CGC TGC ACT GGA GGC GC 

Two important advantages with this recombination-method 
over the classical Cohen-Boyer method- should be noted. 
The procedure is very easy to perform. It involves only 
10 mixing and incubation steps before transformation. No 
PCR-amplif icat'ions or gel separations are required. 
The methods gives significant flexibility and allows 
complex recombinations to be made even with only two 
restriction enzymes . 

IS 

EXAMPLE 2 ; AUTOMATION AND MINIATURISATION OF CHAIN 
SYNTHESIS 

This method describes a rapid process for mixing 
20 appropriate "0" and "1" fragments with the correct 

overhangs to produce a particular string consisting of 
" 0 " ' s and " 1 " ' s . 

Two libraries are produced, one with "0" fragments and 
2 5 one with "l" fragments. As mentioned in the 

description, these are generated with overhangs that can 
be ligated to corresponding overhangs for fragments at 
adjacent positions. These separate members are present 
in separate wells to form the library, such that 
30 position 1 fragments are present in well 1, position 2 
fragments are present in well 2 and so forth. The two 
libraries thus provide the alternatives for each 
position. In order to generate the chain therefore it 
is only necessary to select the correct fragment "0" or 
35 »i" for position 1, and then position 2 etc. Since 

these fragments, as a consequence of their unique 
overhangs, may only hybridize to fragments for adjacent 
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positions, it is necessary only to select the correct 
fragments, then mix and ligate those fragments 
simultaneously. Different ways of achieving this effect 
are shown in Figure 4 which shows three different 
5 alternatives for mixing. 

In Figure 4A, e.g. to produce the chain 0-1-0-0-1, the 
apparatus is used to aspirate from the "0" library at 
positions 1, 3 and 4, and aspirate from the "1" library 

10 at position 2 and 5. The liquids that have been 

aspirated may then be mixed together with ligase and an 
appropriate buffer. In alternative B, each well in the 
library is connected with a tube/nozzle that may be 
closed/opened electronically. Liquid from the nozzles 

15 is directed into the ligation chamber together with 

ligase and an appropriate buffer. Different chains may 
be constructed by appropriately changing the pattern of 
nozzles which are opened/closed. 

20 The procedure may also be miniaturised, e.g. using flow 
cytometry technology as illustrated in Figure 4C. In 
this method, library components are stored in containers 
on top of the "writing-machine". Droplets from each 
container are then guided either to the waste or 

25 production well depending on the nature of the chain 

that is to be constructed. The guiding mechanism is as 
used in ordinary flow cytometers, ie. the droplets are 
charged when they leave the container and may be guided 
electronically in different directions. 

30 

EXAMPLE 3 - LIBRARIES COMPRISING PL I GONUCLEQT I PES FOR 
USE IN THE INVENT I PN 

Conveniently, the cloning method may be performed using 
35 libraries containing oligonucleotides. For example a 
library may contain: 
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1. Oligonucleotides with a common portion and 5 bases 
at the 5' end which vary to provide all possible 
permutations, ie . 1024 variants. 

2. Oligonucleotides with a common portion and 4 bases 
at the 5' end which vary to provide all possible 
permutations, ie . 256 variants. 

3 . Oligonucleotides with a common portion and 5 bases 
at the 3' end which vary to provide all possible 
permutations, ie . 1024 variants. 

4 . Oligonucleotides with a common portion and 6 bases 
at the 3' end which vary to provide all possible 
permutations, ie . 4096 variants. 

In the above, the oligonucleotides are produced such 
that all "1" oligonucleotides are complementary to "2" 
oligonucleotides by virtue of the invariant bases, ie . 
to generate a double stranded molecule with variant 4/5 
base overhangs. Similarly "3" and "4" oligonucleotides 
are complementary . 

Oligonucleotides combined in this way (ie. with 
overhangs at either end of 4-6 bases may also be 
combined together with complementary double stranded 
oligonucleotides also generated by combining certain 
members of the library. In this way variable overhangs 
of different lengths may be created in the resultant 
molecule, e.g. a molecule with a 4 base overhang at both 
the 3 ' and 5 ' end . 

Oligonucleotides may also be provided in the library 
which allow 5' and 3' adapters to be linked. Thus for 
example oligonucleotides having the following form may 
be provided: 

5. 5 1 - AAAA- [ compl ] - FFFFF- 3 1 

6. 5'-DDDDD-[ compl ] - FFFFF- 3 ' 

7. 5 1 -AAAA- [ compl ] -HHHHHH- 3 1 

8. 5 1 -DDDDD- [ compl ] -HHHHHH- 3 1 
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9 . 3 * - [ compl* ] - 5 ' 

10. 5'-BBBB-[ comp2 ]-3' 

11. 5'-EEEEE-[ comp2* ]-3 

12. 5'-[ comp3 ] -GGGGG- 3 ' 

13. 5'-[ comp3* 1-IIIIII-3 



in which "compx" refer to a region which is 
complementary to region " compx* " f ie . "5", "6" , "7" or 
"8" can bind to "9". Furthermore, n comp2" can bind to 

10 oligonucleotide 1 above, "comp2a" can bind to 

oligonucleotide 2, " comp3" can bind to oligonucleotide 
"4" and "comp3* M can bind to oligonucleotide "3". The 
bases denoted "A" bind to "B", ie . "7" and "10" can bind 
ac their ends. Similarly "D" binds to "E" , "F" binds to 

15 "G n and "H" binds to "I". (These bases when together 

may have a variable content, e.g. AAAA=GAGA and then 
BBBB=TCTC . ) 

By appropriate use of the linkers described above, 5' 
20 and 3' adapters may be combined. For example, 

oligonucleotide "2" with a particular 4 base 5' overhang 
may be bound through its complementary region to an 
oligonucleotide linker 11 11" which will then leave a 
"EEEEE" overlap. This may be bound to oligonucleotide 
25 "8" through the overlap which may itself bind 

oligonucleotide "9" through its complementary region. 
The overlap " HHHHHH " may be bound to oligonucleotide 
,f 13 #M which may attach an oligonucleotide "4 11 through 
binding to the complementary region. Thus various 
30 permutations may be made which result in various overlap 

lengths, e.g. any combination of 4, 5, or 6 base 
overlaps which may on the same or different strands. 



F.XAMPLE 4 - TRIMMING PROCEDURE FOR GENERATING UNIQUE 
35 OVERHANGS 

Th-E system presented here makes it possible to perform a 
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trimming procedure with seven different IIS enzymes that 
make 5' 4 base overhangs (Fokl and BstllX) , 5' 5 base 
overhangs [HgaD , 3' 5 base overhangs (BplI and Bael) 
and 3' 6 base overhangs (Cjel and HaelV) . If the 
oligonucleotide system presented here is combined with 
the basic oligonucleotide kit described in Example 3, 
all permutations of 3 ' 5 base and 6 base overhangs and 
all permutations of 5' 4 base and 5 base overhangs can 
be addressed for the trimming procedure. 

in this Example, the location of the binding motifs of 
the initiation linkers is shown below: 



Fokl GGATG 

15 B still - - GCAGC 

Hgal GACGC 

BplI -GAG CTC 

Bael CYATG CA 

Cjel CCA GT 

20 HaelV GAY RTC 

Consensus - -GCAGCGACCATGAGTCCA-CTC- -GTGGATGACGC 



Initiation linkers : 





X = 


0 : 


5 » 


- -GCAGCGACCATGAGTCCA- 


CTC- 


-GTGGATGPPPPPP 


25 






3 ' 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


-CACCTAC 




X = 


1 : 


5 * 


- - GCAGCGACCATGAGTCCA- 


CTC- 


-GTGGATG-PPPPPP 








3 1 


- -CGTCGCTGGTACTCAGGT- 


GAG- 


- CACCTAC - 




x= 


2 : 


5 1 


- - GCAGCGACCATGAGTCCA- 


CTC- 


-GTGGATG- - PPPPPP 








3 ' 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


-CACCTAC- - 


30 


x= 


3 : 


5 ' 


- - GCAGCGACCATGAGTCCA- 


CTC- 


- GTGGATG PPPPPP 








3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


- CACCTAC 




x= 


A : 


5 1 


- - G C AGCGAC C ATG AGT C C A - 


-CTC- 


-GTGGATGACGCPPPPPP 








3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG 




x= 


S : 


5 1 


- - G C AG CG AC C ATG AGT CCA - 


-CTC- 


- GTGGATGACGC -PPPPPP 


35 






3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTACTGCG- 



X=6 : 5- -GCAGCGACCATGAGTCCA-CTC- -GTGGATGACGC- -PPPPPP 
3 ' - -CGTCGCTGGTACTCAGGT -GAG- -CACCTACTGCG- - 
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7 : 


5 » 


- 5 

- - GCAGCGACCATGAGTCCA 


4 - 
-CTC- 


- GTGGATGACGC - - 


-PPPPPP 






3 ' 


- - CGTCGCTGGTACTCAGGT 


-GAG- 


- CACCTACTGCG - - 




x= 


8 : 


5 ' 


- -GCAGCGACCATGAGTCCA 


-CTC- 


- GTGGATGACGC - - 


- - - PPPPPP 






3 ' 


- - CGTCGCTGGTACTCAGGT 


-GAG- 


-CACCTACTGCG- - 




x= 


9 : 


5 ' 


- -GCAGCGACCATGAGTCCA 


-CTC- 


- GTGGATGACGC - - 


PPPPPP 






3 ' 


- - CGTCGCTGGTACTCAGGT 


-GAG- 


-CACCTACTGCG- - 





The 6 base 3 ' overhang PPPPPP is a non-palindromic 
sequence that can be ligated with the complementary 

10 overhang QQQQQQ . The reason 10 different initiation 

linkers are needed is because Bael cuts 10 bases away 
from its binding site. These linkers therefore allow a 
trimming procedure where Bael "jumps" 10 bases for each 
trimming cycle. 10 different start positions will then 

15 be necessary to cover all possibilities. On the other 

side, Hgal cuts only 5 bases away, only necessitating 5 
different start positions. This is the reason the 
binding site for Hgal is not present on X=0 - X=3 , 
above . 

20 

Propagation linkers : 

Fokl : 5 ' GGATG 

3 . CCTACNNNN 

Bst71I: 5' GCAGC 

25 3' CGTCGNNNN 

Hgral : 5' GACGC 

3 . CTGCGNNNNN 

Bpl I : 5' GAG CTCNNNNN 

3 1 CTC GAG 

30 Bael: 5' CCATG CANNNNN 

3. GGTAC GT 

HaelV : 5' GAC GTCNNNNNTSJ 

31 CTG CTG 

Cj el : 5' CCA GTNNNNNN 

35 3' GGT CA 
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Termination linkers: 

The adapters made with the basic oligonucleotides 
described earlier can be used as termination linkers. 
There is therefore no need for a separate set of 
5 termination linkers. 

Method : 

In this method a trimming reaction using Bstlll that 
will begin on a 3 ' 5 base overhang is shown. The target 
10 DNA is shown below in which the first overhang that will 
be generated is marked " * •' . 

**** 

3 1 CACTT **** 

15 

The first Bstlll overhang in the target DNA will be 
located 5-8 bases downstream of the overhang CACTT-3 1 . X 
must therefore be 3 (see the figure below) . The 
following strategy can then be applied: 

20 

One linker is prepared that can address the 3 ' GTGAA 
overhang by annealing 4-3' 6 bases (QQQQQQ) with 3-3' 5 
bases (GTGAA) in one tube: 

25 GTGAA -3' 

3'- QQQQQQ---" 

The 3 1 -GAGTGC overhang is then ligated with the X=3 
initiation linker and the GTGAA- 3 ' overhang is ligated 
30 with the CACTT-3' overhang on the target DNA molecule: 

5 ■ - -GCAGCGACCATGAGTCCA-CTC- -GTGGATG PPPPPP 

3 ' - - CGTCGCTGGTACTCAGGT-GAG- - CACCTAC QQQQQQ 

35 GTGAA 

CACTT 
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EXAMPLE 5 - REMOVAL OF INTERVENING SEQUENCES FROM 
CONSTRUCT? 

In some instances, constructs may be prepared which 
5 contain undesirable nucleic acid sequences between, e.g 
the insert sequence and the vector sequence. Strategies 
for removing the linker sequences should then be 
applied. Illustrated below are some possible strategies 
in which binding sites for restriction enzymes are 
10 provided in the adapter sequences. Cleavage with the 
restriction enzymes will then result in DNA ends that 
can be religated. The vector DNA is marked as . .VWVWV 
while insert DNA is marked as IIIIIII. 

IS Method 1 

Two IIS enzymes that generate 5 ' -4 base overhangs (Bbsl 
and Esp3I) : 

. . VWWVWGAGC- GAGACG GAAGAC- -GAGCI 1 1 1 1 1 1 1 1 1 

2 0 WWWWCTCG - CTCTGC CTTCTG- -CTCGI 1 1 III 1 1 II . . 

After cleavage with Bbsl and Esp3I: 

. . WWWW + GAGC - GAGACG GAAGAC - - + 

2 5 WWWWCTCG -CTCTGC CTTCTG- -CTCG 

GAGCIIIIIIIIII 

IIIIIIIIII . . 

30 After ligation with T4 DNA ligase: 

GAGC - GAGACG GAAGAC - + 

- CTCTGC CTTCTG - CTCG 

35 . . WWWWGAGCIIIIIIIIII 

WWWWCTCGIIIIIIIIII . . 
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Method 2 

One IIS enzyme that generates two 3' 3 base 
overhangs (BsaXI ) : 

. .WVWV\TCAG AC CTCC GAGI 1 1 1 1 1 1 1 1 1 

WWWWCTC TG GAGG CTC I IIIIIIIII 



After cleavage with BsaXI : 

10 

. .WVWVWGAG + AC CTCC GAG 

VVVWWV CTC TG GAGG 



+ IIIIIIIIII 
15 CTCIIIIIIIIII - - 

After ligation with T4 DNA ligase: 

AC CTCC GAG + 

20 CTC TG GAGG 

. .VWVWWGAGIIIIIIIIII 
VWWWVCTCIIIIIIIIII. • 

2 5 Method 3 

One IIS enzyme. that generates blunt ends (Mlyl) 

. . VWWVW GAGTC IIIIIIIIII 

VWVVWV CTGAG IIIIIIIIII. 



30 



35 



After cleavage with Mlyl : 

. VWWVW + GAGTC - 

WWWW CTGAG 



IIIIIIIIII 
IIIIIIIIII 
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After ligation with T4 DNA ligase: 



PCT/GBOO/02512 



GAGTC + 

CTGAG 

5 

. . WWWWIIIIIIIIII 
WWWWIIIIIIIIII. - 

EXAMPLE 6 - IDENTIFYING OLI GONUCLEOTIDE SETS WITH 6 BASg 
10 PAIR OVERHANGS WITH MINIMAL MI S -MATCH LIGATIONS 

In order to identify oligonucleotide sets with 6 base 
pair" overhangs which are unlikely to form mis-match 
ligations with one another the following steps may be 
15 taken. 

1. Create all 2048 overhang pairs of 6 bases. 

2. Remove the 32 palindromic pairs. 

20 This produces a final set of 2016 overhang pairs. 
PART 1 

1. Take a pair as pair #1 and select the next pair by 
executing section 1. 

25 

Sectiqn 1 
Algorithm 1 

Compute the (2016 - n) tables of unweighted mismatch 
scores between the already chosen n pair(s) and all 

30 (2016 - n) remaining pairs, and find among the latter 

the pair(s) for which the lowest score in the table is 
the highest (see below for details about score 
computation) . If there is only one such pair, then 
select it. If there are several pairs, then compute the 

3 5 weighted mismatch scores of the overhang comparisons 

that gave the lowest unweighted score and find the 
pair(s) for which the lowest weighted score is the 
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highest. If there is only one such pair, then select 
it. If there are several pairs, then redo the whole 
procedure using the second lowest unweighted score in 
the mismatch table, then the third lowest, and so on. 
If several pairs remain tied after all mismatch scores 
have been considered, keep them all. 



Repeat algorithm 1 for each selected pair and iterate it 
over the desired number of positions to obtain the 

10 chain (s) of overhang pairs. This procedure generates a 
tree with an overhang pair on each branch. The lowest 
unweighted and weighted mismatch scores of the 
particular combination of pairs at each point are 
computed. A particular pathway is stopped (1) when the 

15 desired number of positions is reached, or (2) when the 
combination of pairs is one that has already been found 
earlier, or (3) when the lowest mismatch scores of that 
combination are lower than the lowest scores of the 
complete chain (s) already constructed. Point (3) ensures 

20 that each new complete chain always has lowest mismatch 
scores that are higher than or at least equal to those 
of the previously constructed chain(s). Note also that, 
as a result of this process, all pairs in a given chain 
are unique and all complete chains in the tree are 

25 unique. The whole process terminates when the last 
pathway to be explored stops. Keep the complete 
chain (s) whose lowest mismatch scores are the highest. 

Repeat section 1 starting with each of the 2016 pairs as 
30 pair #1 to produce a set of 2016 overhang chains. Find 
the best chain (s) by applying algorithm 2 



Algorithm 2 

For all chains, compute the tables of unweighted 
35 mismatch scores between all the pairs that are present 

in the chain, and find the chain (s) for which the lowest 
score in the table is the highest (see below for 
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details). If there is only one such chain, then select 
it. If there are several chains, then compute the 
weighted mismatch scores of the overhang comparisons 
that gave the lowest unweighted score and find the 
chain (s) for which the lowest weighted score is the 
highest. If there is only one such chain, then select 
it. If there are several chains, then redo the whole 
procedure using the second lowest unweighted score in 
the mismatch table, then the third lowest, and so on. 
If several chains remain tied after all mismatch scores 
have been considered, then keep all of them. 

This allows the production of a set of one or more 
overhang chains . 

PART 2 

Take a chain and execute- sect ion 2. 

Section 2 
Algorithm 3 

For that chain, find the overhang pair(s) that is (are) 
responsible for the lowest unweighted and weighted 
scores in the table of mismatch scores between all pairs 
in the chain. Then, create new chains by substituting 
that pair with all remaining overhang pairs that are not 
present in the original chain (if there are several 
pairs to be substituted, substitute one pair at a time) . 
From the complete set of newly generated chains and the 
original chain, select one or more chains following 
algorithm 2. Here, including the original chain into 
algorithm 2 ensures that the selected chains always have 
a mismatch score that is higher than or at least equal 
to the score of the original chain. The improvement (if 
any) may involve the lowest or nth lowest unweighted 
score, or the corresponding weighted score. 

Repeat algorithm 3 for each selected chain. This 
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procedure generates a tree with a chain on each branch. 
Each new chain which is added to the tree has a mismatch 
score higher than or equal to the score of the chain 
found in the previous step. A particular pathway is 
5 stopped when the selected chain is one that has already 

been found earlier. This ensures that all chains in the 
tree are unique. The whole process terminates when the 
last pathway to be explored stops. Keep all the chains 
that are present in the tree. 

10 

Repeat section 2 (i.e., construct a tree) starting with 
each of the chains selected at the end of part 1 . 

From the whole set of chains present in all trees, 
15 select one or more chains following algorithm 2. 

This produces a final set of one or more overhang 
chains . 

2 0 COMPUTATION OF MISMATCH SCORES 

Unweighted score 

The unweighted score for a ligation between two 6 -base 
overhangs is the number of mismatches observed, 
25 considering the triplets of the first 3 and the last 3 
bases separately. For example, the score for the 
ligation AAAAAC / TTTGCA is 0-3 and the score for 
AAAAAC / TCAGGG is 2-2. All possible scores are ranked 
from highest to lowest according to the order below: 

30 

highest : : 3-3 

3-2/2-3 

2- 2 

3- 1/1-3 

35 2-1/1-2 

1-1 

3-0/0-3 
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2-0/0-2 

lowest : : 1-0/0-1 
Weighted score 

5 The weighted score (WS) for a ligation is computed as 
follows : 

6 

WS = 6-£ BPS, 
i = l 

10 where BPSi is the score for the particular base pair at 
site i and is given in the table below: 



AA = 


1 . 


0 


CA = 


0 


. 6 


GA = 


1 


. 0 


TA = 


0 


. 0 


AC = 


0 . 


6 


CC = 


1 


. 0 


GC = 


0 


. 0 


TC = 


0 


. 6 


15 AG = 


1 . 


0 


CG = 


0 


. 0 


GG = 


0 


. 9 


TG = 


0 


. 2 


AT = 


0 . 


0 


CT = 


0 


. 6 


GT - 


0 


. 2 


TT = 


0 


. 6 



For the perfect match between an overhang and its 
complement , WS = 6 . 

20 

COMPARISON AMONG PAIRS AND CONSTRUCTION OF TABLES OF 
SCORES 

Finding the next overhang pair 

25 

To select the next overhang pair, tables of mismatch 
scores between the pairs selected at previous positions 
and all remaining pairs are computed. To construct such 
a table, all previously selected pairs are compared with 
30 the new pair and also every overhang is compared with 

itself. Thus, if n pairs have already been selected, the 
number of ligations considered for each table is 4n + 
2 (n + 1) = 6n+2. When comparing two overhangs that are on 
the same DNA strand, one of them is reversed. 

35 

Let us consider the following example where pairs 
AAAAAC/TTTTTG (1A/1B) and AAACGT/TTTGCA (2A/2B) have 
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been chosen previously and the new pair AGTCCC/TCAGGG 
(3A/3B) is tried at the next position: 

The corresponding table is : 

5 





Comparison 


Overhang 


Ligation 


Unweighted 
Score 


Weighted 
Score 


10 


1 vs 1 


1A 
1A 


AAAAAC 
CAAAAA 


3-3 


0 . 8 






IB 
IB 


TTTTTG 
GTTTTT 


3-3 


3 . 2 




2 vs 2 


2A 
2A 


AAACGT 
TGCAAA 


2-2 


2 . 8 






2B 
2B 


TTTGCA 
ACGTTT 


2-2 


4 . 4 


15 


3 vs 3 


3A 
3A 


AGTCCC 
CCCTGA 


2-2 


3 . 6 






3B 
3B 


TCAGGG 
GGGACT 


2-2 


3 . 6 




1 vs 3 


1A 
3A 


AAAAAC 
CCCTGA 


3-2 


2 . 6 


20 




1A 
3B 


AAAAAC 
TCAGGG 


2-2 


2 . 4 






IB 

3A 


TTTTTG 
AGTCCC 


2-2 


4.0 






IB 
3B 


TTTTTG 
GGGACT 


3-2 


4 . 6 




2 vs 3 


2A 
3A 


AAACGT 
CCCTGA 


3-2 


2 . 7 


25 




2A 
3B 


AAACGT 
TCAGGG 


2-2 


3 . 3 






2B 
3A 


TTTGCA 
AGTCCC 


2-2 


3 . 6 
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2B 


TTTGCA 


3-2 


3 . 4 




3B 


GGGACT 







Here, the lowest score is 2-2; 2.4 given by the ligation 
5 between overhangs 1A and 3B. 

Score table for a chain 

To compute the table of mismatch scores for a chain, all 
10 overhang pairs contained in the chain are compared with 

each other and also every overhang is compared with 
itself. Thus, for a chain of p overhang pairs, the 
number of ligations considered is 4p(p-l)/2 + 2p = 
2(p2) . As above, one of the two overhangs is reversed 
15 in the comparison when both are on the same DNA strand. 

For example, let us consider the following 3 -pair (i.e., 
4 -position) chain: AAAAAC / TTTTTG (1A/1B), AAACGT / TTTGCA 
(2A/2B) , AGTCCC/TCAGGG (3A/3B) in which 1A is on one 
20 fragment, IB and 2A are on a second fragment, 2B and 3A 

are on a third fragment and 3B is on a fourth fragment. 

The corresponding table is: 



Comparison 


Overhang 


Ligation 


Unweighted 


Weighted 








Score 


Score 


1 VS 1 


1A 


AAAAAC 


3-3 


0 . 8 




1A 


CAAAAA 








IB 


TTTTTG 


3-3 


3 . 2 




.IB 


GTTTTT 






2 vs 2 


2A 


AAACGT 


2-2 


2 . 8 




2A 


TGCAAA 








2B 


TTTGCA 


2-2 


4 . 4 




2B 


ACGTTT 
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3 vs 3 


3A 
3A 


AGTCCC 
CCCTGA 


2-2 


3 . 6 




3B 
3B 


TCAGGG 
GGGACT 


2-2 


3 . 6 


1 vs 2 


1A 
2A 


AAAAAC 
TGCAAA 


2-3 


1 . 8 




1A 
2B 


AAAAAC 
TTTGCA 


0-3 


3 . 8 




IB 
2A 


TTTTTG 
AAACGT 


0-3 


5 . 0 




IB 
2B 


TTTTTG 
ACGTTT 


2-3 


3 . 8 


1 vs 3 


1A 

3A 


AAAAAC 
CCCTGA 


3-2 


2 . 6 




1A 
3B 


AAAAAC 
TCAGGG 


2-2 


2 . 4 




IB 
3A 


TTTTTG 
AGTCCC 


2-2 


4 . 0 




IB 
3B 


TTTTTG 
GGGACT 


3-2 


4 . 6 


2 vs 3 


2A 
3A 


AAACGT 
CCCTGA 


3-2 


2 . 7 




2A 
3B 


AAACGT 
TCAGGG 


2-2 


3 . 3 




2B 
3A 


TTTGCA 
AGTCCC 


2-2 


3.6 




2B 
3B 


TTTGCA 
GGGACT 


3-2 


3 . 4 



20 Here, the lowest score is 0-3; 3.8 given by the ligation 

between overhangs 1A and 2B. 
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Results obtained: 
Table of breaking points 
5 PART 1 



10 



# of 

posit ions 


Unweighted 
score 


Weighted 
score 


# of equal 
chains 


3 


3-3 


1.6 


48 


4 


2-2 


4 . 0 


48 


9 


2-2 


2 . 5 


12 


10 


3-1 


3 . 2 


12 


14 


3-1 


2 . 4 


6 


15 


2-1 


4 . 6 


6 


33 


2-1 


3 . 0 


12 


34 


3-0 


4 . 6 


12 


90 


3-0 


3 . 1 





PART 2 



# of 

positions 


Unweighted 
score 


Weighted 
score 


# of equal 
chains 


3 


3-3 


1 . 6 


48 


4 


3-2 


2 . 2 


48 


9 


2-2 


2 . 5 


12 


10 


3-1 


3 . 2 


12 


14 


3-1 


2 . 4 


6 


15 


3-1 


2 . 0 


6 


33 


2-1 


3 . 0 


12 


34 


3-0 


4 . 6 


12 


90 









30 

It will be noted that the unweighted mis-match score (in 

which (S = 3-3, 8 = 3-2, 7 - 2-2, 6 = 3-1, 5 = 2-1, 4 = 

1-1, 3 = 3-0, 2 = 2-0, 1 = 1-0) reduces as the number of 



BNSDOCID- <WO 010C816A1J_> 



WO 01/00816 



- 67 - 



PCT/GB00/02512 



positions increases . 

Samples of chains obtained at the end of part 1 and at 
the end of part 2 

5 

3 positions (this chain is obtained at the end of both 
parts) : 

AACTCG / TTGAGC 
TCTCAC / AGAGTG 

10 

4 positions: 
part 1 

AATTGG / TTAACC 
TGCCAC / ACGGTG 
15 AT AG T C / TAT C AG 
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part 2 

AATGGG/TTACCC 
TCGGAC / AGCCTG 
TTAACG / AATTG C 

9 positions (this 
parts) : 
AATCAC / TTAGTG 
TGAGGG/ ACTCCC 
TCGGAT/AGCCTA 



hain is obtained at 

TACACG/ATGTGC 
ACATTC / TGTAAG 
GGCTAG/ CCGATC 



PCT/GB00/02512 



he end of both 

AGGCTG/TCCGAC 
TTTAGC/ AAATCG 



10 positions (this chain is obtained at the end of both 
parts) : 

AAAACC / TTTTGG AGGCTC/TCCGAG TCGATA/AGCTAT 

I 5 TTGGGG / AACCCC GTCATG/CAGTAC ATTCAG / TAAGTC 

T CAT AG / AGTATC TGCAGT/ ACGTCA AGAGAT/TCTCTA 

14 positions (this chain is obtained at the end of both 
parts) : 

ACGTGC/TGCACG GTTGGC/ CAACCG TCAGCC/ AGTCGG 

2 0 TATGAG / AT ACT C TTGCGG/AACGCC AGAGGG/TCTCCC 

TGCACG/ACGTGC AGTATC /TCAT AG CACCGC/ GTGGCG 
ATACAC / TATGTG TGACTA/ ACTGAT 
AACTTG / TTGAAC ACTCCG / TGAGGC 



25 15 positions: 

part 1 

AAAACC /TTTTGG 
TTGGGG/ AACCCC 
TCAT AG/ AGTATC 
3 0 AGGCTC/TCCGAG 
GTCATG / CAGT AC 



TGCAGT/ ACGTCA 
TCGATA/AGCTAT 
ATTCAG / TAAGTC 
AGAGAT/TCTCTA 
TACTTC/ ATGAAG 



AAGTAA/ TTCATT 
CCGTCC/ GGCAGG 
TGT AAC / AC ATTG 
ACCGTG / TGGCAC 
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part 2 

AAAACC/TTTTGG 
TTGGGG/AACCCC 
T CAT AG / AG TAT C 
5 AGGCTC/TCCGAG 
GACAAG / CTGTTC 
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TCTGCT/AGACGA 
TCGATA/AGCTAT 
ATTCAG / TAAGTC 
AGAGAT/TCTCTA 
TACTTC / ATGAAG 



PCT/GB00/02512 

AAGTAA/ TTCATT 
CCGTCC/ GGCAGG 
TGT AAC / AC ATTG 
ACCGTG/TGGCAC 



33 positions (this 
parts) : 

10 AACTAG / TTGATC 

TGGAGC/ACCTCG 
TCAAAT / AGTTT A 
CAGGCC/ GTCCGG 
TATCAC/ATAGTG 

15 AGATTC/TCTAAG 
TTCCGT/ AAGGCA 
CCCACG / GGGTGC 
ATGCCG / TACGGC 
TCCGTC / AGGCAG 

2 0 CCACGC/ GGTGCG 

ACTATG / TGATAC 
TTAGCA/AATCGT 



chain is obtained at the end of both 



GTAAGG/ CATTCC 
AAACTA/TTTGAT 
GTCTCC/CAGAGG 
ACAGCG / TGTCGC 
CACATC/ GTGTAG 
TGTGTA/ ACACAT 
TAATGC/ ATTACG 
GGTAAG / CCATTC 
AGTTAT / TC AATA 
CAACAG/GTTGTC 
ATCGGC/TAGCCG 
AATGCT/TTACGA 
TTGGAG / AACCTC 



TCGCCT/AGCGGA 
TCTCGG/AGAGCC 
ACCCCC/TGGGGG 
TTTTCG/AAAAGC 
AAGTCA / TTCAGT 
GTTCTC / CAAGAG 



34 positions (this 

25 parts) : 

AACTCT/TTGAGA 
TCGAAC / AGCTTG 
CAGGGC /GTCCCG 
TAAAGG / ATTTCC 

3 0 TGTGCG / ACACGC 

ATGTAG / TACATC 
TTCCCC / AAGGGG 
AATCTC / TTAGAG 
TGGCGT / ACCGCA 

3 5 GGCTGC/ CCGACG 



chain is obtained at the end of both 



TTATTC/AATAAG 
CACAAG/GTGTTC 
TCCGAT / AGGCTA 
AGTAGC/TCATCG 
CCGTCG/GGCAGC 
TCACTA/ AGTGAT 
GTGACG / CACTGC 
TGAAAT/ ACTTTA 
AGCATG/TCGTAC 
ACCGTC/TGGCAG 



CCAATC/GGTTAG 
ACTTAT / TG AAT A 
AAAGAG / TTTCTC 
TTGATA/ AACTAT 
AAGACC/TTCTGG 
CAATCC/GTTAGG 
TCTCG C / AG AGCG 
AGGGGG/TCCCCC 
TGCCAG/ACGGTC 
TACT AC / ATGATG 
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TTTGAC/AAACTG 
ACACCG / TGTGGC 
TGAGGC/ACTCCG 



5 

90 positions (this chain is obtained at the end of part 





1 ) : 

AAAAAA / TTTTTT 


T>/~*rr>s-*(~i /-i / A P A PPP 


a. Zi a. ccc / tttppp 




CCGGCC/ GCCCGC 


a pp pap / tt 1 r^cvr* 


ttthtp / a a appp 


1 0 


TV t — 1 rp Tv /"""* / rp r~\ r — ^ 7V rp / — i 

AGGTAG / TCCATC 


1 CCC IL/ALbLAb 


Az\rrA a / T ,r pr , r ,r r r r 

f\KLL/i/i / 1 1 ob 1 1 




1 CCA I C/ AGG 1 AG 


APTPAT / T^P A. PT Z\ 
Avj 1 v_ A 1/1 v_AV_j 1 A 


v^PVPiPlp-iv^ / \j 1 1 1 1 vj 




ATCTGC / TAGACG 


TPA PTA / APTPAT 

1 CAC 1 A/ ALj 1 L.A 1 


A 2\rT A A / TTPPTT 
AALjvjAA/ 1 1 LL 1 1 




TAGACG /ATCTGC 


p 7v p ppp / t 1 ppp p 
CACCCC/ C 1 CCCC 


pp r^r^r^r* / p c*c*r* pp 




tv / — irr< /~* rp /mr 1 TV /~» TV P< 

ACTGTG / TGACAC 


C 1 CCCC/ LAbLLb 


a PTrrr /TrArr,r 


"i r~ 

15 


TGACAC / ACTGTG 


A A TTTf / TT A A AP 
AA 1 1 1L/ 11 A/iHb 


1 V_ VJJ v^. / Pivj x uLu 




CATTAC/ GTAATG 


TTA A AP /STlTTTP 

1 1 AAALj / AA 1 1 1L 


A T"TTT7A /TZ\ 7A AT 
Hi 1 1 1/-V/ 1 X 




ACCCCA/ TGGGGT 




ATPPTA /TAHflAT 

Pi. X v_ v_ X ^t. / l/iobril 




ATCC1A/ IALLA1 


bbl 1 / Lv-ArtLu 


AHTATP /TCATAG 

Pi.vj7 X ^1 1 J. V /A X £-\\J3 




rpTvTyrp /ppt'T'p'P 
CCAACC/ CC 1 1 Lb 


PAPPA P /PTHHTn 
v^P\y_, b,jH.b_. / ui OVjj X V3 


TCATAG /AC5TATC 


O Pk 


7\ rnm -7\ r~*r~* / rp t\ tv T*pp 

AI 1ACC/ iAAJ.CC 


APAATZl /TPTTAT 
AbjA-rt. 1 .M. / 1 v_ x X *i J. 


-Pi X vj X Vjvj / X ^i. V .^i. \ — - V — 




1M1 CC / A 1 i ACC 


IL1 J. -M. 1 / -rt.bJpi.T'i X -r\ 


TAfAff /ATGTGG 




LiLL i L / vjAvjvjAvj 


Pi. X v^^-v_^-i x / X /-ivj x J. ri 


ATGCAC /TACGTG 




APTTP A / TPA TS. PT 1 
Av_3 1 1 bjA / 1 v_ AAb. 1 


TAHTTA /ATPAAT 


TACGTG /ATGCAC 




tv tv m /T /■sfTi / «r m Tv i r*"*t Tv 

AATGCT/TTACGA 


7\ rirr-imriTi / rp/~» A APT 

AC 1 1 C A / 1 C AAC 1 


APTA AP / npp A T'T'Pl 
HL 1 HHL / 1 vjpV lib 


25 


TTACGA/ AATGCT 


AGCCCC/TCGGGG 


TGATTG / ACTAAC 




AAGCGC/TTCGCG 


TCGGGG/ AGCCCC 


CAGTGC/ GTCACG 




TTCGCG/ AAGCGC 


ACCATG / TGGTAC 


GTCACG/ CAGTGC 




CCCAAG/ GGGTTC 


TGGTAC/ACCATG 


AATAAG/TTATTC 




GGGTTC/ CCCAAG 


AGGGGA/ TCCCCT 


TTATTC/AATAAG 


30 


ACATCC/TGTAGG 


CTAAT C / G ATTAG 


AGATAT/TCTATA 




TGTAGG / ACATCC 


CGAGAG/GCTCTC 


TCT AT A / AG AT AT 




AACTTG/TTGAAC 


GCTCTC/ CGAGAG 


AAGTCG / TTCAGC 




TTG AAC / AACTTG 


ACACGT / TGTGCA 


TTCAGC/ AAGTCG 




AT AG AC / T ATCTG 


TGTGCA/ ACACGT 


AATCGA/ TTAGCT 


35 


TATCTG / ATAGAC 


CCTGTC/ GGACAG 


TTAGCT/AATCGA 




AGACCG / TCTGGC 


GGACAG/CCTGTC 


AGGCTC/TCCGAG 
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TCCGAG/AGGCTC 
CGGGGC/GCCCCG 

5 EXAMPLE 7 - CONSTRUCTION OF A 5 - FRAGMENT CHAIN ENCODING 

THE BINARY SEQUENCE 1-0-1-0-0 

This experiment demonstrates the construction of a 
specific 5 fragment chain using a set of four 
10 non-palindromic 5' 6 base overhang pairs. The set of 
four unique overhang pairs was found using a computer 
program as described in Example 6 . 

Based upon the overhang pairs, a set of five library 
15 components was made by annealing complementary 
oligonucleotides in separate tubes: 
signal 1 : 

5 « - TAATACGACTCACTATACCACAAGTTTGTACAAAAAAGCAGGCTCTATTC - 3 1 
and 5' -TAGGAAGAATAGAGCCTGCTTTTTTGTACAAACTTGTGGTATAGTGA 

2 0 GTCGTATTA- 3 ' ; 

signal 2 : 

5 ' -TTCCTATGCAGTGGACCACTTTGTACAAGAAAGCTGGGTTGCAGT- 3 ' and 
5 ■ - GCAACTACTGCAACCCAGCTTTCTTGTACAAAGTGGTCCACTGCA - 3 ' ; 
signal 3 : 

25 5 • - AGTTGCTTGACGCCACAAGTTTGTACAAAAAAGCAGGCTTTGACG - 3 1 and 

5 1 -CGACATCGTCAAAGCCTGCTTTTTTGTACAAACTTGTGGCGTCAA- 3 1 ; 
signal 4 : 

5 » - ATGTCGAAGGGCGGACCACTTTGTACAAGAAAGCTGGGTAAGGGC - 3 ■ and 
5 ' -GACAGGGCCCTTACCCAGCTTTCTTGTACAAAGTGGTCCGCCCTT- 3 ' ; 

30 signal 5 : 

5 ' -CCTGTCATGTGGACCACTTTGTACAAGAAAGCTGGGTTTCTATAGTGTCACCT 
AAATC- 3 ' and 5' -GATTTAGGTGACACTATAGAAACCCAGCTTTCTTGTACAA 
AGTGGTCCACAT - 3 ' ; 

T7 : 5 ' - TAATACGACTCACTATACCA - 3 ' 

3 5 T7-Cy5 primer: 5 ' - TAATACGACTCACTATA - 3 ' 

SP6 primer : 3 ' - AAGATATCACAGTGGATTTAG - 5 1 
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The library components (4 pmol each) were then mixed 
together and ligated using 100 U T4 DNA ligase (NEB) in 
IX ligase buffer at 25°C for 15 minutes. The ligase was 
then inactivated at 65°C for 20 min . 

5 

Sfjil of the ligation reaction (50^1) was used as template 
in a PCR reaction (50^1) containing IX Thermopol buffer 
(NEB), 0.05 mM dNTPs , 0.4 ^iM T7 primer, 0.4 /iM SPG 
primer and 0.04 U/fil Vent polymerase (NEB) . The PCR was 

10 hot started (95°C for 3 minutes before addition of 

polymerase) and cycled 30 times; 95°C, 30 sec; 55°C, 30 
sec; 76°C, 30 sec, using a PTC-200 thermo cycler (MJ 
Research) . 10 /xl of the PCR was analysed on a 1.5% 
agarose gel as shown in Figure 5 . The gel picture showed 

15 only one intense band corresponding to approximately 240 
bp as expected (243 bp) . The remaining PCR product was 
extracted twice with chloroform and precipitated using 
71% ethanol and 0 . 1M NaAc . The DNA was dissolved in 
water and sequenced. The sequence confirmed that the 

20 expected signal chain (1-0-1-0-0) was generated. 

EXAMPLE 8 - CONSTRUCTION OF A 5X5 FRAGMENT CHAIN 
ENCODING THE BINARY SEQUENCE USING ONE LIGATION CYCLE 
FOLLOWED BY ONE PCT CYCLE OR BY TWO LIGATION CYCLES 

25 

This experiment demonstrates the use of complementary 
primer pairs to link fragment chains together as an 
alternative to the ligation strategy demonstrated in the 
previous example. 

30 

In this experiment 5 fragments chains with 5 positions 
(fragments or bits) each are ligated separately in 
ligation cycle 1 as demonstrated earlier (Example 7) . 
The 5 fragment chains are then amplified with 5 
35 different primer pairs (pair 1 is used to amplify chain 

1, pair 2 is used to amplify chain 2, etc) . The second 
primer in primer pair 1 is complementary to the first 
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primer in prime pair 2, the second primer in primer pair 
2 is complementary to the first primer in primer pair 3, 
and so on . 

A small aliquot is then taken from each of the 5 PCR 
reactions and a new PCR reactions is performed with 
primers that are specific to the end of signal chain 1 
and 5. The method is illustrated in Figure 6. 

Materials : 

Oligonucleotides are selected which bind to the fragment 
chain and also serve as primers. Thus for example, for 
adjacent chains may be bound using for example the 
following primer pairs: 

fragment chain 2 terminal (with bound primer) : 
TTCTATAGTGTCACCTAAATC 

AAGATATCACAGTGGATTTAGCCTACCAGTACATCCAACGGCAACT 

fragment chain 3 terminal (with bound primer) : 
GTCATGTAGGTTGCCGTTGATCCATCCTAATACGACTCACTATAGCA 

ATTATGCTGAGTGATATCGT 

The above exemplified primer regions are complementary 
and may thus be bound together. 

As an alternative to this method, two ligation cycles 
may be used in which 5 fragment chains (generated by 
ligation), are ligated together. Thus, several 
construction cycles to build up long signal chains. 
After the initial ligation in the first ligation cycle 
the 5 fragment chains are then amplified with primers 
containing a Foicl site. The primers are appropriately 
selected such that digestion with Fokl will then make 
non-palindromic overhangs in the end of each fragment 
chain in which the overhang generated in fragment chain 
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1 is able to ligate with the first overhang generated in 
fragment chain 2, the second overhang generated in 
fragment chain 2 is able to ligate with the first 
overhang generated in fragment chain 3, and so on. The 5 
5 fragment chains can thereby be ligated together in a 

controlled manner to generate a final chain with 25 
fragments (bits) . 

If we want to construct fragment chains with 100 or 500 
10 fragments we can repeat this procedure 1 or 2 more 

times. The polymerase capacity will, however, be a 
limiting factor regarding how many ligation cycles it is 
possible to perform. Other strategies will therefore 
need to be employed to construct even longer chains. 

15 

EXAMPLE 9: CLONING OF AN INSERT FROM PHIX174 INTO PUC1 
WITH A TRIMMED GENE A u 

This experiment demonstrates the "trimming" strategy for 
20 elimination of unwanted flanking sequences. Another 
important aspect of this experiment is that we 
demonstrate that it is possible to link a 5' and 3' 
overhang together with a single stranded oligonucleotide 
alone. It should also be noted that the inserts are 
25 cloned into two different IIS sites, thereby eliminating 

the problem with insert concatemerisat ion . 

In this method, Gene A from PhiX174 is cloned into a 
pUC-19 vector. PhiX174 is prepared by cleavage with 

30 BJbvI , resulting in 15 fragments flanked by different 

non-palindromic 5' 4 bases overhangs, as described in 
more detail in Example 1. The two overhangs adjacent to 
Gene A is then addressed with "initiation linkers" 
containing a BplI site, while the rest of the fragments 

35 is allowed to religate. T4 DNA ligase, BplI , a 

"propagation linker" containing a BplI site, and two 
"termination adaptors" addressed to the first and last 
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five bases of Gene A respectively are used. The 
solution is incubated at 37°C thereby allowing the 
trimming reaction to succeed until terminated when the 
five first and last bases in Gene A are reached. 

5 

The pUC-19 vector is prepared by cleavage with Hga.1 and 
Bsal . The overhang generated by Hgral cleavage are 
described in Example 1. Cleavage with Bsal results in 4 
non- identical cleavages giving rise to 8 non- ident ical 
10 overhangs, e.g. site 1- GCCA/CGGT (1600). 

Gene A has the following sequence at its first and last 
five bases (marked by underlining) . 

15 . . . GCTGGAGGCCTCCACTATGAAATCGCGTAGAG . . . 

. . . CGACCTCCGGAGGTGATACTTTAGCGCATC 

GTGGCGGAAAATGAGAAAATTCGACCTA . . . 

. . . ACGACCGCCTTTTACTCTTTTAAGCTGG 



20 

When terminating the trimming procedure at the 
underlined sequences it is possible to clone Gene A 
without any unwanted flanking base pairs. The 3' 5 base 
overhangs generated by BplI correspond to the marked 
25 base pairs. 

The overhang pair generated by Hgal and Bsal in pUC19 
that is used as a cloning site for the gene A from 
PhiX174 is TTCTC/ CGGT . 

30 

Method : 

This is as described in Example 1 except that PUC19 is 
cut with both Hga! (NEB 4, 37°C) and thereafter with 
Bsal (NEB 4, 50°C) 
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Materials : 

Initiation linker 1 (s) : 

5 ' ATT CGG TCG AGA TGC TCT CA3 ' 

5 

Initiator linker 1 (as) : 

5 ' CGA CTG AGA GCA TCT CGA CCG AAT3 ' 

Initiation linker 2 (s) : 
10 5'GCG TTA CTG AGC GTA GCT CTG3 ' 

Inititator linker 2 (as) : 

5 1 CTC TCA GAG CTA CGC TCA GTA ACG C3 ' 

15 Propagation linker (s) : 

5 * TGC TGC AGG AGC GAA TCT CNN NNN3 ' 

Propagation linker (as) : 

5 1 GAG ATT CGC TCC TGC AGC A3 1 

20 

Labeling linker 2 (s) 

5 ' CTC TTG CTA TAG TGA GTC GTA TTA3 1 

Labeling linker 2 (as) : 
2 5 5 1 TAA TAC GAC TCA CTA TAG CA3 1 

Termination linker 1 (s) : 

5 1 AAG AGC TCA GGT CAT TGA CGT AGC TAT GAA3 

30 Termination linker 1/2 (as) : 

5 ' AGC TAC GTC AAT GAC CTG AG 3 1 

Termination linker 1 (short version) : 
5 ' AAG AGA TGA A3 ' 



35 



Termination linker 2 (s) : 

5 * ACC GCT CAG GTC ATT GAC GTA GCT TCA TT3 
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Termination linker 2 (short version) : 
5 ' ACC GTC ATT 3 1 

The efficiency of the trimming reaction may be accessed 
as follows. Overhang 6) is addressed with a y- j2 P 
labelled adaptor. The trimming reaction is then allowed 
to start from overhang 1) . Aliquots are taken out at 
regularly time intervals and the size distribution of 
the DNA fragments is then analysed on gel. 
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Claims : 

1. A method of attaching a fragment of a first nucleic 
acid molecule to a second nucleic acid molecule, wherein 
said method comprises at least the steps: 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
recognition site to create at least one fragment of said 
first nucleic acid molecule having a single stranded 
nucleotide region (SSla) at at least one terminus of 
said fragment, 

2) if necessary generating a single stranded 
nucleotide region (SS2) at at least one terminus of said 
second nucleic acid molecule, 

3) binding to at least one single stranded region of 
step 1) (SSla) an adapter molecule comprising at one 
terminus a single stranded region (SSA1) complementary 
to the single stranded region of said first nucleic acid 
molecule fragment (SSla) and additionally comprising at 
the other terminus a further single stranded region 
(SSA2) complementary to the single stranded region (SS2) 
at one terminus of said second nucleic acid molecule, 

4) ligating said adapter to said first nucleic acid 
fragment , 

5) binding said adapter to said second nucleic acid 
molecule, and 

6) ligating said adapter to said second nucleic acid 
molecule . 

2. A method as claimed in claim 1 wherein said first 
nucleic acid molecule fragment has a single stranded 
nucleotide region at either terminus (SSla and SSlb) , 
each of which is bound by an adapter, which may be the 
same or different, and the first of said adapters is 
bound to said second nucleic acid molecule and the 
second of said adapters binds either to said second 
nucleic acid molecule or to a third nucleic acid 
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3. A method as claimed in claim 2, wherein said 
adapters bind to the termini of said second nucleic acid 

5 molecule, thereby forming a circular nucleic acid 
molecule . 

4. A method as claimed in any one of claims 1 to 3 , 
wherein said second nucleic acid molecule is a vector or 

10 a fragment thereof and single stranded regions are 

produced in step 2) by cleavage of said vector with a 
nuclease . 

5. A method as claimed in an one of claims 1 to 4 , 

15 wherein said adapter molecule additionally comprises one 
or more nuclease recognition and cleave sites. 

6. A method as claimed in any one of claims 1 to 5 , 
wherein said nuclease is a restriction enzyme from the 

20 class of IP or IIS enzymes. 

7. A method as claimed in any one of claims 1 to 6 , 
wherein two or more fragments of the first nucleic acid 
molecule are attached to different second and optionally 

25 third nucleic acid molecules, or different termini 
thereof . 

8. A method as claimed in any one of claims 4 to 7 , 
wherein one or more fragments of said first nucleic acid 

30 molecule are attached via adapters to single stranded 
regions in said second nucleic acid molecule resulting 
from different cleavage events. 

9. A method as claimed in claim 7 or 8 , wherein one or 
35 more fragments of said first nucleic acid molecule are 

attached via adapters to single stranded regions in two 
or more second nucleic acid molecules. 
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10. A method as claimed in any one of claims 1 to 9, 
wherein 2 or more first nucleic acid molecules are 
cleaved and bound to one or more second nucleic acid 
molecules by adapter molecules simultaneously in the 

:- same reaction. 

11. A method as claimed in any one of claims 1 to 10, 
wherein all the steps are conducted together. 

1C 12 . A nucleic acid molecule produced according to a 

method as defined in any one of claims 1 to 11. 

13 . A cloning or expression vector containing the 
nucleic acid molecule as defined in claim 12 . 

15 

14 . A eukaryotic or prokaryotic cell or transgenic 
organism containing a vector as defined in claim 13 . 

15 . A kit for attaching a first nucleic acid molecule 
20 fragment to a second nucleic acid molecule or a fragment 

thereof according to the method defined in any one of 
claims 1 to 11 comprising at least (i) one or more 
adapters as described in any one of claims 1 to 9, (ii) 
the second nucleic acid molecule and (iii) a nuclease 
25 which cleaves outside its recognition site, wherein the 
terminus of one of said adapters has a single stranded 
region complementary to a single stranded region 
generated on said second nucleic acid molecule after 
cleavage with said nuclease. 

30 

16 . A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of : 
1) generating n double stranded nucleic acid 
fragments, wherein at least n-2 fragments have single 
35 stranded regions at both termini and 2 fragments have 
single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 
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to (n-1) other single stranded regions, thereby 
producing (n-1) complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 

5 binding of said complementary pairs of single stranded 
regions, and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments. 

10 

17. A method as claimed in claim 16 wherein said 
fragments are each between 8 and 25 bases in length. 

18. A method as claimed in claim 16 or 17 wherein n is 
15 at least 10. 

19. A method as claimed in any one of claims 16 to 18 
wherein said fragment comprises a region representing a 
unit of information corresponding to one or more code 

20 elements. 

20. A method as claimed in claim 19 wherein said code 
is alphanumeric. 

25 21. A method as claimed in claim 20 wherein said code 
is binary. 

22. A method as claimed in anyone of claims 19 to 21 
wherein each of said one or more code elements has the 
30 formula 

<X) a , 
wherein 

X is a nucleotide A, T, G, C or a derivative 
thereof which allows complementary binding and may be 
35 the same or different at each position, and 
a is an integer from 4 to 10, 
wherein (X) a is different for each one or more code 
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23. A method as claimed in claim 22, wherein said code 
is binary and the code elements "1" and "0" have the 
formulae : 

"0" = (X) a and "1" = (Y) b , 
wherein 

(X) a and (Y) b are not identical, 

X and Y are each a nucleotide A, T, G, C or a 
derivative thereof which allows complementary binding 
and may be the same or different at each position, and 

a and b are integers from 4 to 10. 

24 . A method as claimed in claim 23 wherein in the 
formulae (X) a and (Y) b , X and Y are the same at each 
position . 

25. A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of : 

1) generating fragment chains according to the method 
defined in any one of claims 16 to 24; 

2) optionally generating single stranded regions at 
the end of said fragment chains, wherein said single 
stranded regions are complementary to other single 
stranded regions on said fragment chains thus forming 
complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, 
simultaneously or consecutively, to effect binding of 
said complementary pairs of single stranded regions. 

26 . A nucleic acid molecule produced according to a 
method as defined in any one of claims 16 to 25, or a 
single stranded nucleic acid molecule thereof. 

27. A method of identifying the code elements contained 
in a nucleic acid molecule prepared according to a 
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method as defined in any one of claims 16 to 25, wherein 
a probe, carrying a signalling means, specific to one or 
more code elements, is bound to said nucleic acid 
molecule and a signal generated by said signalling means 
5 is detected, whereby said one or more code elements may 

be identified . 

28. A library of fragments as defined in any one of 
claims 16 to 27, comprising (n) ra fragments, wherein n is 

10 as defined in any one of claims 16 to 27 and corresponds 
to the length of chain that said library may produce, 
and m is an integer corresponding to the number of 
possible code elements or combinations thereof, such 
that fragments corresponding to all possible code 

15 elements for each position in the final chain are 
provided . 
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«0» starting fragments: « l» starting fragments: 

Position 1 GGGG GGGGAAA AAAAAAAAAA 

CCCCCCCCC TTTTTTTT 

Position 2 GGGG GGGGAAC A AAA AAA A AC 

TTTCCCCCCCCC TTTTTTTTTTT 



Position 7 



Position 8 



GGGG GGGGCCG 
GCGCCCCCCCCC 

GGGG GGGG 
GGCCCCCCCCCC 



AAAAAAACCG 
G CG TTTTTTTT 

AAAAAAA 
GGCT1 



FIG. 2 



Position 1.1 



Position 1. 



Fragment 0 

GGGG GGGGAAA 
CCCCCCCCC 



Position 1.2 AAAGGGG GGGGAAA 

CCCCCCCCC 



AACGGGG GGGGAAA 
CCCCCCCCC 



Fragment I 

AAAAAAAAAA 
TTTTTTTT 

AAA AAA AAAAAAA 
TTTTTTTT 

AACAAAAAAAAAA 



Position 8.1 



Position 8.2 



Position 8.3 



GGGG GGGG 
GCCCCCCCCCCTTT 

GGGG GGGG 
GCCCCCCCCCCTTG 

GGGG GGGG 
GCCCCCCCCCCTTC 



AAAAAAA 
G C TTTTTTTTTTT 

AAAAAAA 
G CTTTTTTTTTTG 

AAAAAAA 
G CTTTTTTTTTTG 



FIG. 3 
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Position: 1 



Library with 
"0" signals 




Position: 1 



Library with 
"1" signals 



Position: 




FIG. 4 
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Production- 
well 
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